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NON-ENDOGENOUS, CONSTITUTTVELY ACTIVATED 
HUMAN G PROTEIN-COUPLED RECEPTORS 

This patent application is a continuation-in-part of, and claims priority from, U.S. 

Serial Number 09/170,496, filed with the United States Patent and Trademark Office on 

5 October 13, 1998. This application also claims the benefit of priority from the following 

provisional applications, all filed via U.S. Express Mail with the United States Patent and 

Trademark Office on the indicated dates: U.S. Provisional Number 60/1 10,060, filed 

November 27, 1998; U.S. Provisional Number 60/120,416, filed February 16, 1999; U.S. 

Provisional Number 60/121,852, filed February 26, 1999 claiming benefit of U.S. 

10 Provisional Number 60/109,213, filed November 20, 1998; U.S. Provisional Number 
60/123,944, filed March 12, 1999; U.S. Provisional Number 60/123,945, filed March 12, 
1999; U.S. Provisional Number 60/123,948, filed March 12, 1999; U.S. Provisional 
Number 60/123,951, filed March 12, 1999; U.S. Provisional Number 60/123,946, filed 
March 12, 1999; U.S. Provisional Number 60/123,949, filed March 12, 1999; U.S. 

15 Provisional Number 60/152,524, filed September 3, 1999, claiming benefit of U.S. 
Provisional Number 60/151,1 14, filed August 27, 1999 and U.S. Provisional Number 
60/108,029, filed November 12, 1998; U.S. Provisional Number 60/136,436, filed May 28, 
1999; U.S. Provisional Number 60/136,439, filed May 28, 1999; U.S. Provisional Number 
60/136,567, filed May 28, 1999; U.S. Provisional Number 60/137,127, filed May 28, 

20 1999; U.S. Provisional Number 60/137,13 1, filed May 28, 1999; U.S. Provisional Number 



WO 00/22131 PCT/US99/24065 

-2- 

60/141,448, filed June 29, 1999 claiming benefit of U.S. Provisional Number 60/136,437, 
filed May 28, 1999; U.S. Provisional Number 60/156,633, filed September 29, 1999; U.S. 
Provisional Number 60/156,555, filed September 29, 1999; U.S. Provisional Number 

60/156,634, filed September 29, 1999;U.S. Provisional Number (Arena 

5 Pharmaceuticals, Inc. docket number: CHN10-1), filed September 29, 1999; U.S. 

Provisional Number (Arena Pharmaceuticals, Inc. docket number: RUP6-1), filed 

October 1, 1999; U.S. Provisional Number (Arena Pharmaceuticals, Inc. docket 

number: RUP7-1), filed October 1, 1999; U.S. Provisional Number (Arena 
Pharmaceuticals, Inc. docket number: CHN6-1), filed October 1, 1999; U.S. Provisional 

10 Number ( Arena Pharmaceuticals, Inc. docket number: RUP5-1), filed October 1, 1999; 

and U.S. Provisional Number (Arena Pharmaceuticals, Inc. docket number: CHN9-1), 

filed October 1, 1999. This application is also related to co-pending U.S. Serial Number 

(Woodcock, Washburn, Kurtz, Makiewicz & Norris, LLP docket number AREN- 

0050), filed on October 12, 1999 (via U.S. Express Mail) and U.S. Serial Number 

15 09/364,425, filed on July 30, 1999, both incorporated herein by reference. This 

application also claims priority to U.S. Serial Number (Woodcock, Washburn, 

Kurtz, Makiewicz & Norris, LLP docket number AREN-0054), filed on October 12, 1999 
(via U.S. Express Mail), incorporated by reference herein in its entirety. Each of the 
foregoing applications are incorporated by reference herein in their entirety. 

20 FIELD OF THE INVENTION 

The invention disclosed in this patent document relates to transmembrane 
receptors, and more particularly to human G protein-coupled receptors, and specifically to 



4 
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GPCRs that have been altered to establish or enhance constitutive activity of the receptor. 
Preferably, the altered GPCRs are used for the direct identification of candidate compounds 
as receptor agonists, inverse agonists or partial agonists having potential applicability as 
therapeutic agents. 

5 BACKGROUND OF THE INVENTION 

Although a number of receptor classes exist in humans, by far the most 
abundant and therapeutically relevant is represented by the G protein-coupled receptor (GPCR 
or GPCRs) class. It is estimated that there are some 1 00,000 genes within the human genome, 
and of these, approximately 2%, or 2,000 genes, are estimated to code for GPCRs. Receptors, 

1 0 including GPCRs, for which the endogenous ligand has been identified are referred to as 
"known" receptors, while receptors for which the endogenous ligand has not been identified 
are referred to as "orphan" receptors. GPCRs represent an important area for the development 
of pharmaceutical products: from approximately 20 of the 100 known GPCRs, 60% of all 
prescription pharmaceuticals have been developed. 

15 GPCRs share a common structural motif. All these receptors have seven 

sequences of between 22 to 24 hydrophobic amino acids that form seven alpha helices, each 
of which spans the membrane (each span is identified by number, /. e., transmembrane- 1 (TM- 
1), transmebrane-2 (TM-2), etc.). The transmembrane helices are joined by strands of amino 
acids between transmembrane-2 and transmembrane- 3, transmembrane-4 and transmembrane- 

20 5, and transmembrane-6 and transmembrane- 7 on the exterior, or "extracellular" side, of the 
cell membrane (these are referred to as "extracellular" regions 1 , 2 and 3 (EC- 1 , EC-2 and EC- 
3), respectively). The transmembrane helices are also joined by strands of amino acids 
between transmembrane- 1 and transmembrane-2, transmembrane-3 and transmembrane-4, and 
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transmembrane-5 and transmembrane-6 on the interior, or "intracellular" side, of the cell 
membrane (these are referred to as "intracellular" regions 1, 2 and 3 (IC-1, IC-2 and IC-3), 
respectively). The "carboxy" ("C") terminus of the receptor lies in the intracellular space 
within the cell, and the "amino" ("N") terminus of the receptor lies in the extracellular space 

5 outside of the cell. 

Generally, when an endogenous ligand binds with the receptor (often referred 
to as "activation" of the receptor), there is a change in the conformation of the intracellular 
region that allows for coupling between the intracellular region and an intracellular "G- 
protein." It has been reported that GPCRs are "promiscuous" with respect to G proteins, i.e., 

10 that a GPCR can interact with more than one G protein. See, Kenakin, T. s 43 Life Sciences 
1095 (1988). Although other G proteins exist, currently, Gq, Gs, Gi, Gzand Go are G proteins 
that have been identified. Endogenous Iigand-activated GPCR coupling with the G-protein 
begins a signaling cascade process (referred to as "signal transduction"). Under normal 
conditions, signal transduction ultimately results in cellular activation or cellular inhibition. 

15 It is thought that the IC-3 loop as well as the carboxy terminus of the receptor interact with 
the G protein. 

Under physiological conditions, GPCRs exist in the cell membrane in 
equilibrium between two different conformations: an "inactive" state and an "active" state. 
A receptor in an inactive state is unable to link to the intracellular signaling transduction 
20 pathway to produce a biological response. Changing the receptor conformation to the active 
state allows linkage to the transduction pathway (via the G-protein) and produces a biological 
response. 

A receptor may be stabilized in an active state by an endogenous ligand or a 
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compound such as a drug. Recent discoveries, including but not exclusively limited to 
modifications to the amino acid sequence of the receptor, provide means other than 
endogenous ligands or drugs to promote and stabilize the receptor in the active state 
conformation. These means effectively stabilize the receptor in an active state by 
simulating the effect of an endogenous ligand binding to the receptor. Stabilization by 
such ligand-independent means is termed "constitutive receptor activation." 

SUMMARY OF THE INVENTION 
Disclosed herein sire non-endogenous versions of endogenous, human GPCRs and 
uses thereof. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 is a representation of 8XCRE-Luc reporter plasmid {see, Example 

4(c)3.) 

Figures 2 A and 2B are graphic representations of the results of ATP and ADP 
binding to endogenous TDAG8 (2A) and comparisons in serum and serum free media (2B). 

Figure 3 is a graphic representation of the comparative signaling results of 
CMV versus the GPCR Fusion Protein H9(F236K):Gsa. 

DETAILED DESCRIPTION 

The scientific literature that has evolved around receptors has adopted a 
number of terms to refer to ligands having various effects on receptors. For clarity and 
consistency, the following definitions will be used throughout this patent document. To the 
extent that these definitions conflict with other definitions for these terms, the following 

definitions shall control: 

AGONISTS shall mean materials (e.g., ligands, candidate compounds) that 
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activate the intracellular response when they bind to the receptor, or enhance GTP binding to 
membranes. 

AMINO ACID ABBREVIATIONS used herein are set out in Table A: 



PARTIAL AGONISTS shall mean materials (e.g., ligands, candidate compounds) 
that activate the intracellular response when they bind to the receptor to a lesser degree/extent 
than do agonists, or enhance GTP binding to membranes to a lesser degree/extent than do 
agonists. 

ANTAGONIST shall mean materials (e.g., ligands, candidate compounds) that 
competitively bind to the receptor at the same site as the agonists but which do not activate 
the intracellular response initiated by the active form of the receptor, and can thereby inhibit 
the intracellular responses by agonists or partial agonists. ANTAGONISTS do not diminish 
the baseline intracellular response in the absence of an agonist or partial agonist. 

CANDIDATE COMPOUND shall mean a molecule (for example, and not limitation, 



TABLE A 



ALANINE 
ARGININE 
ASPARAGINE 
ASPARTIC ACID 

CYSTEINE 
GLUTAMIC ACID 
GLUTAMINE 
GLYCINE 
HISTLDINE 
ISOLEUCINE 
LEUCINE 
LYSINE 
METHIONINE 
PHENYLALANINE 
PROLINE 
SERINE 
THREONINE 
TRYPTOPHAN 
TYROSINE 
VALINE 



ALA 
ARG 
ASN 
ASP 
CYS 
GLU 
GLN 
GLY 
HIS 
ILE 
LEU 
LYS 
MET 
PHE 
PRO 
SER 
THR 
TRP 
TYR 
VAL 



A 
R 
N 
D 
C 
E 

Q 
G 
H 
I 

L 
K 
M 
F 
P 
S 
T 
W 
Y 
V 
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a chemical compound) that is amenable to a screening technique. Preferably, the phrase 
"candidate compound" does not include compounds which were publicly known to be 
compounds selected from the group consisting of inverse agonist, agonist or antagonist to a 
receptor, as previously determined by an indirect identification process ("indirectly identified 
compound"); more preferably, not including an indirectly identified compound which has 
previously been determined to have therapeutic efficacy in at least one mammal; and, most 
preferably, not including an indirectly identified compound which has previously been 
determined to have therapeutic utility in humans. 

COMPOSITION means a material comprising at least one component; a 
"pharmaceutical composition" is an example of a composition. 

COMPOUND EFFICACY shall mean a measurement of the ability of a compound 
to inhibit or stimulate receptor functionality, as opposed to receptor binding affinity. 
Exemplary means of detecting compound efficacy are disclosed in the Example section of this 
patent document. 

CODON shall mean a grouping of three nucleotides (or equivalents to nucleotides) 
which generally comprise a nucleoside (adenosine (A), guanosine (G), cytidine (C), uridine 
(U) and thymidine (T)) coupled to a phosphate group and which, when translated, encodes an 
amino acid. 

CONSTITUTIVELY ACTIVATED RECEPTOR shall mean a receptor subject to 
constitutive receptor activation. A constitutively activated receptor can be endogenous or non- 
endogenous. 

CONSTITUTIVE RECEPTOR ACTIVATION shall mean stabilization of a 
receptor in the active state by means other than binding of the receptor with its endogenous 
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ligand or a chemical equivalent thereof. 

CONTACT or CONTACTING shall mean bringing at least two moieties together, 

whether in an in vitro system or an in vivo system. 

DIRECTLY IDENTIFYING or DIRECTLY IDENTIFIED, in relationship to the 

5 phrase "candidate compound", shall mean the screening of a candidate compound against a 
constitutively activated receptor, preferably a constitutively activated orphan receptor, and 
most preferably against a constitutively activated G protein-coupled cell surface orphan 
receptor, and assessing the compound efficacy of such compound. This phrase is, under no 
circumstances, to be interpreted or understood to be encompassed by or to encompass the 

10 phrase "indirectly identifying" or "indirectly identified." 

ENDOGENOUS shall mean a material that a mammal naturally produces. 
ENDOGENOUS in reference to, for example and not limitation, the term "receptor," shall 
mean that which is naturally produced by a mammal (for example, and not limitation, a 
human) or a virus. By contrast, the term NON-ENDOGENOUS in this context shall mean 

1 5 that which is not naturally produced by a mammal (for example, and not limitation, a human) 
or a virus. For example, and not limitation, a receptor which is not constitutively active in its 
endogenous form, but when manipulated becomes constitutively active, is most preferably 
referred to herein as a "non-endogenous, constitutively activated receptor." Both terms can 
be utilized to describe both "in vivo" and "in vitro" systems. For example, and not limitation, 

20 in a screening approach, the endogenous or non-endogenous receptor may be in reference to 
an in vitro screening system. As a further example and not limitation, where the genome of 
a mammal has been manipulated to include a non-endogenous constitutively activated 
receptor, screening of a candidate compound by means of an in vivo system is viable. 
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G PROTEIN COUPLED RECEPTOR FUSION PROTEIN and GPCR FUSION 
PROTEIN, in the context of the invention disclosed herein, each mean a non-endogenous 
protein comprising an endogenous, constitutively activate GPCR or a non-endogenous, 
constitutively activated GPCR fused to at least one G protein, most preferably the alpha (a) 

5 subunit of such G protein (this being the subunit that binds GTP), with the G protein 
preferably being of the same type as the G protein that naturally couples with endogenous 
orphan GPCR. For example, and not limitation, in an endogenous state, if the G protein 
"Gsa" is the predominate G protein that couples with the GPCR, a GPCR Fusion Protein 
based upon the specific GPCR would be a non-endogenous protein comprising the GPCR 

1 0 fused to Gsa; in some circumstances, as will be set forth below, a non-predominant G protein 



can 



be fused to the GPCR. The G protein can be fused directly to the c-terminus of the 



constitutively active GPCR or there may be spacers between the two. 

HOST CELL shall mean a cell capable of having a Plasmid and/or Vector 
incorporated therein. In the case of a prokaryotic Host Cell, a Plasmid is typically replicated 

15 as a autonomous molecule as the Host Cell replicates (generally, the Plasmid is thereafter 
isolated for introduction into a eukaryotic Host Cell); in the case of a eukaryotic Host Cell, 
a Plasmid is integrated into the cellular DNA of the Host Cell such that when the eukaryotic 
Host Cell replicates, the Plasmid replicates. Preferably, for the purposes of the invention 
disclosed herein, the Host Cell is eukaryotic, more preferably, mammalian, and most 

20 preferably selected from the group consisting of 293, 293T and COS-7 cells. 

INDIRECTLY IDENTIFYING or INDIRECTLY IDENTIFIED means the 
traditional approach to the drug discovery process involving identification of an endogenous 
ligand specific for an endogenous receptor, screening of candidate compounds against the 
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receptor for determination of those which interfere and/or compete with the ligand-receptor 
interaction, and assessing the efficacy of the compound for affecting at least one second 
messenger pathway associated with the activated receptor. 

INHIBIT or INHIBITING, in relationship to the term "response" shall mean that a 

5 response is decreased or prevented in the presence of a compound as opposed to in the 
absence of the compound. 

INVERSE AGONISTS shall mean materials (e.g., ligand, candidate compound) 
which bind to either the endogenous form of the receptor or to the constitutively activated 
form of the receptor, and which inhibit the baseline intracellular response initiated by the 

10 active form of the receptor below the normal base level of activity which is observed in the 
absence of agonists or partial agonists, or decrease GTP binding to membranes. Preferably, 
the baseline intracellular response is inhibited in the presence of the inverse agonist by at least 
30%, more preferably by at least 50%, and most preferably by at least 75%, as compared with 
the baseline response in the absence of the inverse agonist. 

1 5 KNOWN RECEPTOR shall mean an endogenous receptor for which the endogenous 

ligand specific for that receptor has been identified. 

LIGAND shall mean an endogenous, naturally occurring molecule specific for an 
endogenous, naturally occurring receptor. 

MUTANT or MUTATION in reference to an endogenous receptor's nucleic acid 

20 and/or amino acid sequence shall mean a specified change or changes to such endogenous 
sequences such that a mutated form of an endogenous, non-constitutively activated receptor 
evidences constitutive activation of the receptor. In terms of equivalents to specific 
sequences, a subsequent mutated form of a human receptor is considered to be equivalent to 
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a first mutation of the human receptor if (a) the level of constitutive activation of the 
subsequent mutated form of a human receptor is substantially the same as that evidenced by 
the first mutation of the receptor; and (b) the percent sequence (amino acid and/or nucleic 
acid) homology between the subsequent mutated form of the receptor and the first mutation 

5 of the receptor is at least about 80%, more preferably at least about 90% and most preferably 
at least 95%. Ideally, and owing to the fact that the most preferred cassettes disclosed herein 
for achieving constitutive activation includes a single amino acid and/or codon change 
between the endogenous and the non-endogenous forms of the GPCR, the percent sequence 
homology should be at least 98%. 

10 NON-ORPHAN RECEPTOR shall mean an endogenous naturally occurring 

molecule specific for an endogenous naturally occurring ligand wherein the binding of a 
ligand to a receptor activates an intracellular signaling pathway. 

ORPHAN RECEPTOR shall mean an endogenous receptor for which the 
endogenous ligand specific for that receptor has not been identified or is not known. 

15 PHARMACEUTICAL COMPOSITION shall mean a composition comprising at 

least one active ingredient, whereby the composition is amenable to investigation for a 
specified, efficacious outcome in a mammal (for example, and not limitation, a human). Those 
of ordinary skill in the art will understand and appreciate the techniques appropriate for 
determining whether an active ingredient has a desired efficacious outcome based upon the 

20 needs of the artisan. 

+ 

PLASMID shall mean the combination of a Vector and cDNA. Generally, a Plasmid 
is introduced into a Host Cell for the purposes of replication and/or expression of the cDNA 
as a protein. 
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STIMULATE or STIMULATING, in relationship to the term "response" shall mean 
that a response is increased in the presence of a compound as opposed to in the absence of the 
compound. 

VECTOR in reference to cDNA shall mean a circular DN A capable of incorporating 
at least one cDNA and capable of incorporation into a Host Cell. 

The order of the following sections is set forth for presentational efficiency and is not 
intended, nor should be construed, as a limitation on the disclosure or the claims to follow. 

A. Introduction 

The traditional study of receptors has always proceeded from the a priori assumption 
(historically based) that the endogenous ligand must first be identified before discovery could 
proceed to find antagonists and other molecules that could affect the receptor. Even in cases 
where an antagonist might have been known first, the search immediately extended to looking 
for the endogenous ligand. This mode of thinking has persisted in receptor research even after 
the discovery of constitutively activated receptors. What has not been heretofore recognized 
is that it is the active state of the receptor that is most useful for discovering agonists, partial 
agonists, and inverse agonists of the receptor. For those diseases which result from an overly 
active receptor or an under-active receptor, what is desired in a therapeutic drug is a 
compound which acts to diminish the active state of a receptor or enhance the activity of the 
receptor, respectively, not necessarily a drug which is an antagonist to the endogenous ligand. 
This is because a compound that reduces or enhances the activity of the active receptor state 
need not bind at the same site as the endogenous ligand. Thus, as taught by a method of this 
invention, any search for therapeutic compounds should start by screening compounds against 
the ligand-independent active state. 
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B. Identification of Human GPCRs 

The efforts of the Human Genome project has led to the identification of a plethora of 
information regarding nucleic acid sequences located within the human genome; it has been 
the case in this endeavor that genetic sequence information has been made available without 
an understanding or recognition as to whether or not any particular genomic sequence does 
or may contain open-reading frame information that translate human proteins. Several 
methods of identifying nucleic acid sequences within the human genome are within the 
purview of those having ordinary skill in the art. For example, and not limitation, a variety 
of human GPCRs, disclosed herein, were discovered by reviewing the GenBank™ database, 
while other GPCRs were discovered by utilizing a nucleic acid sequence of a GPCR, 
previously sequenced, to conduct a BLAST™ search of the EST database. Table B, below, 
lists several endogenous GPCRs that we have discovered, along with a GPCR's respective 
homologous receptor. 



TABLE B 



Disclosed 


Accession 


Open Reading 


Per Cent 


Reference To 


Human 


Number 


Frame 


Homology 


Homologous 


Orphan 


Identified 


(Base Pairs) 


To Designated 


GPCR 


GPCRs 






GPCR 


(Accession No.) 


hARE-3 


AL033379 


1,260 bp 


52.3% LPA-R 


U92642 


hARE-4 


AC006087 


1,119 bp 


36% P2Y5 


AF000546 


hARE-5 


AC006255 


1,104 bp 


32% Oryzias 


D43633 








latipes 




hGPR27 


AA775870 


1,128 bp 






hARE-1 


AI090920 


999 bp 


43% 


D13626 








KIAA0001 




hARE-2 


AA3 59504 


1,122 bp 


53% GPR27 




hPPRl 


H67224 


1,053 bp 


39%EBI1 


L31581 


hG2A 


AA754702 


1,113 bp 


31%GPR4 


L36148 
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hRUP3 


AL035423 


1,005 bp 


30% 
Drosophila 
melanogaster 


2133653 




hRUP4 


AI307658 


1,296 bp 


32% pNPGPR 
28% and 29 % 
Zebra fish Ya 

and Yb, 
respectively 


NP 004876 
AAC41276 

and 
AAB94616 




hRUP5 


AC005849 


1,413 bp 


25% DEZ 
23% FMLPR 


Q99788 
P21462 




hRUP6 


AC005871 


1,245 bp 


48% GPR66 


NP_006047 


5 


hRUP7 


AC007922 


1,173 bp 


43% H3R 


AF140538 




hCHN3 


EST 36581 


1,113 bp 


53% GPR27 






hCHN4 


AA80453 1 


1,077 bp 


32% thrombin 


4503637 




bCHN6 


EST 2134670 


1,503 bp 


36%edg-l 


NP_001391 




HCHN8 


EST 764455 


1,029 bp 


47% 
KIAA0001 


D 13626 


10 


hCHN9 


EST 1541536 


1,077 bp 


41%LTB4R 


NM_000752 




hCHNIO 


EST 1365839 


1,055 bp 


35% P2Y 


NM_002563 



Receptor homology is useful in terms of gaining an appreciation of a role of the 
receptors within the human body. As the patent document progresses, we will disclose 
techniques for mutating these receptors to establish non-endogenous, constitutively activated 

1 5 versions of these receptors. 

The techniques disclosed herein have also been applied to other human, orphan 
GPCRs known to the art, as will be apparent as the patent document progresses. 



C. Receptor Screening 

Screening candidate compounds against a non-endogenous, constitutively activated 
20 version of the human GPCRs disclosed herein allows for the direct identification of candidate 
compounds which act at this cell surface receptor, without requiring use of the receptor's 
endogenous ligand. By determining areas within the body where the endogenous version of 
human GPCRs disclosed herein is expressed and/or over-expressed, it is possible to determine 
related disease/disorder states which are associated with the expression and/or over-expression 
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of the receptor; such an approach is disclosed in this patent document 

With respect to creation of a mutation that may evidence constitutive activation of the 
human GPCR disclosed herein is based upon the distance from the proline residue at which 
is presumed to be located within TM6 of the GPCR; this algorithmic technique is disclosed 
5 in co-pending and commonly assigned patent document U.S. Serial Number 09/170,496, 
incorporated herein by reference. The algorithmic technique is not predicated upon traditional 
sequence "alignment" but rather a specified distance from the aforementioned TM6 proline 
residue. By mutating the amino acid residue located 1 6 amino acid residues from this residue 
(presumably located in the IC3 region of the receptor) to, most preferably, a lysine residue, 

10 such activation may be obtained. Other amino acid residues may be useful in the mutation 
at this position to achieve this objective. 
D. Disease/Disorder Identification and/or Selection 

As will be set forth in greater detail below, most preferably inverse agonists to the 
non-endogenous, constitutively activated GPCR can be identified by the methodologies of this 

15 invention. Such inverse agonists are ideal candidates as lead compounds in drug discovery 
programs for treating diseases related to this receptor. Because of the ability to directly 
identify inverse agonists to the GPCR, thereby allowing for the development of 
pharmaceutical compositions, a search for diseases and disorders associated with the GPCR 
is relevant. For example, scanning both diseased and normal tissue samples for the presence 

20 of the GPCR now becomes more than an academic exercise or one which might be pursued 
along the path of identifying an endogenous ligand to the specific GPCR. Tissue scans can 
be conducted across a broad range of healthy and diseased tissues. Such tissue scans provide 
a preferred first step in associating a specific receptor with a disease and/or disorder. See, for 
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example, co-pending application (docket number ARE-0050) for exemplary dot-blot and RT- 
PCR results of several of the GPCRs disclosed herein. 

Preferably, the DNA sequence of the human GPCR is used to make a probe for (a) 
dot-blot analysis against tissue-mRNA, and/or (b) RT-PCR identification of the expression 
of the receptor in tissue samples. The presence of a receptor in a tissue source, or a 
diseased tissue, or the presence of the receptor at elevated concentrations in diseased tissue 
compared to a normal tissue, can be preferably utilized to identify a correlation with a 
treatment regimen, including but not limited to, a disease associated with that disease. 
Receptors can equally well be localized to regions of organs by this technique. Based on 
the known functions of the specific tissues to which the receptor is localized, the putative 
functional role of the receptor can be deduced. 
E. Screening of Candidate Compounds 

1. Generic GPCR screening assay techniques 

When a G protein receptor becomes constitutively active, it binds to a G protein {e.g., 
Gq, Gs, Gi, Gz, Go) and stimulates the binding of GTP to the G protein. The G protein then 
acts as a GTPase and slowly hydrolyzes the GTP to GDP, whereby the receptor, under normal 
conditions, becomes deactivated. However, constitutively activated receptors continue to 
exchange GDP to GTP. A non-hydrolyzable analog of GTP, [ 35 S]GTPyS, can be used to 
monitor enhanced binding to membranes which express constitutively activated receptors. 
It is reported that [ 35 S]GTPyS can be used to monitor G protein coupling to membranes in the 
absence and presence of ligand. An example of this monitoring, among other examples well- 
known and available to those in the art, was reported by Traynor and Nahorski in 1995. The 
preferred use of this assay system is for initial screening of candidate compounds because the 
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system is generically applicable to all G protein-coupled receptors regardless of the particular 
G protein that interacts with the intracellular domain of the receptor. 
2. Specific GPCR screening assay techniques 

Once candidate compounds are identified using the "generic" G protein-coupled 
receptor assay (/. e. ( an assay to select compounds that are agonists, partial agonists, or inverse 
agonists), further screening to confirm that the compounds have interacted at the receptor site 
is preferred. For example, a compound identified by the "generic" assay may not bind to the 
receptor, but may instead merely "uncouple" the G protein from the intracellular domain. 

a. Gs, Gz and GL 

Gs stimulates the enzyme adenylyl cyclase. Gi (and Gz and Go), on the other hand, 
inhibit this enzyme. Adenylyl cyclase catalyzes the conversion of ATP to cAMP; thus, 
constitutively activated GPCRs that couple the Gs protein are associated with increased 
cellular levels of cAMP. On the other hand, constitutively activated GPCRs that couple Gi 
(or Gz, Go) protein are associated with decreased cellular levels of cAMP. See, generally, 
"Indirect Mechanisms of Synaptic Transmission," Chpt. 8, From Neur on To Brain (3 rd Ed.) 
Nichols, J.G. et al eds. Sinauer Associates, Inc. (1992). Thus, assays that detect cAMP can 
be utilized to determine if a candidate compound is, e.g., an inverse agonist to the receptor 
(i.e., such a compound would decrease the levels of cAMP). A variety of approaches known 
in the art for measuring cAMP can be utilized; a most preferred approach relies upon the use 
of anti-cAMP antibodies in an ELISA-based format. Another type of assay that can be 
utilized is a whole cell second messenger reporter system assay. Promoters on genes drive 
the expression of the proteins that a particular gene encodes. Cyclic AMP drives gene 
expression by promoting the binding of a cAMP-responsive DNA binding protein or 
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transcription factor (CREB) that then binds to the promoter at specific sites called cAMP 
response elements and drives the expression of the gene. Reporter systems can be constructed 
which have a promoter containing multiple cAMP response elements before the reporter gene, 
e.g., p-galactosidase or luciferase. Thus, a constitutively activated Gs-linked receptor causes 
5 the accumulation of c AMP that then activates the gene and expression of the reporter protein. 
The reporter protein such as p-galactosidase or luciferase can then be detected using standard 
biochemical assays (Chen etal. 1995). 
b. Go and Gq. 

10 Gq and Go are associated with activation of the enzyme phospholipase C, which in 

turn hydrolyzes the phospholipid PIP 2 , releasing two intracellular messengers: 
diacycloglycerol (DAG) and inistol 1 ,4,5-triphoisphate (IP 3 ). Increased accumulation of IP 3 
is associated with activation of Gq- and Go-associated receptors. See, generally, "Indirect 
Mechanisms of Synaptic Transmission," Chpt. 8, From Neuron To Brain (3 rd Ed.) Nichols, 

15 J.G. et al eds. Sinauer Associates, Inc. (1992). Assays that detect IP 3 accumulation can be 
utilized to determine if a candidate compound is, e.g., an inverse agonist to a Gq- or Go- 
associated receptor {i.e., such a compound would decrease the levels of IP 3 ). Gq-associated 
receptors can also been examined using an API reporter assay in that Gq-dependent 
phospholipase C causes activation of genes containing API elements; thus, activated Gq- 

20 associated receptors will evidence an increase in the expression of such genes, whereby 
inverse agonists thereto will evidence a decrease in such expression, and agonists will 
evidence an increase in such expression. Commercially available assays for such detection 
are available. 



V 
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3* GPCR Fusion Protein 

The use of an endogenous, constitutively activate orphan GPCR or a non-endogenous, 
constitutively activated orphan GPCR, for use in screening of candidate compounds for the 
direct identification of inverse agonists, agonists and partial agonists provide an interesting 

5 screening challenge in that, by definition, the receptor is active even in the absence of an 
endogenous ligand bound thereto. Thus, in order to differentiate between, e.g., the non- 
endogenous receptor in the presence of a candidate compound and the non-endogenous 
receptor in the absence of that compound, with an aim of such a differentiation to allow for 
an understanding as to whether such compound may be an inverse agonist, agonist, partial 

10 agonist or have no affect on such a receptor, it is preferred that an approach be utilized that 
can enhance such differentiation. A preferred approaches the use of a GPCR Fusion Protein. 

Generally, once it is determined that a non-endogenous orphan GPCR has been 
constitutively activated using the assay techniques set forth above (as well as others), it is 
possible to determine the predominant G protein that couples with the endogenous GPCR. 

15 Coupling of the G protein to the GPCR provides a signaling pathway that can be assessed. 
Because it is most preferred that screening take place by use of a mammalian expression 
system, such a system will be expected to have endogenous G protein therein. Thus, by 
definition, in such a system, the non-endogenous, constitutively activated orphan GPCR will 
continuously signal. In this regard, it is preferred that this signal be enhanced such that in the 

20 presence of, e.g., an inverse agonist to the receptor, it is more likely that it will be able to more 
readily differentiate, particularly in the context of screening, between the receptor when it is 
contacted with the inverse agonist. 

The GPCR Fusion Protein is intended to enhance the efficacy of G protein coupling 
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with the non-endogenous GPCR. The GPCR Fusion Protein is preferred for screening with 
a non-endogenous, constitutively activated GPCR because such an approach increases the 
signal that is most preferably utilized in such screening techniques. This is important in 
facilitating a significant "signal to noise" ratio; such a significant ratio is import preferred for 
5 the screening of candidate compounds as disclosed herein. 

The construction of a construct useful for expression of a GPCR Fusion Protein is 
within the purview of those having ordinary skill in the art. Commercially available 
expression vectors and systems offer a variety of approaches that can fit the particular needs 
of an investigator. The criteria of importance for such a GPCR Fusion Protein construct is 
1 o that the endogenous GPCR sequence and the G protein sequence both be in-frame (preferably, 
the sequence for the endogenous GPCR is upstream of the G protein sequence) and that the 
"stop" codon of the GPCR must be deleted or replaced such that upon expression of the 
GPCR, the G protein can also be expressed. The GPCR can be linked directly to the G 
protein, or there can be spacer residues between the two (preferably, no more than about 12, 
1 5 although this number can be readily ascertained by one of ordinary skill in the art). We have 
a preference (based upon convenience) of use of a spacer in that some restriction sites that arc 
not used will, effectively, upon expression, become a spacer. Most preferably, the G protein 
that couples to the non-endogenous GPCR will have been identified prior to the creation of 
the GPCR Fusion Protein construct. Because there are only a few G proteins that have been 
20 identified, it is preferred that a construct comprising the sequence of the G protein (i.e.. a 
universal G protein construct) be available for insertion of an endogenous GPCR sequence 
therein; this provides for efficiency in the context of large-scale screening of a variety of 
different endogenous GPCRs having different sequences. 
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As noted above, constitutively activated GPCRs that couple to Gi, Gz and Go are 
expected to inhibit the formation of cAMP making assays based upon these types of GPCRs 
challenging (i.e., the cAMP signal decreases upon activation thus making the direct 
identification of, e.g ; inverse agonists (which would further decrease this signal), interesting). 
5 As will be disclosed herein, we have ascertained that for these types of receptors, it is possible 
to create a GPCR Fusion Protein that is not based upon the endogenous GPCR's endogenous 
G protein, in an effort to establish a viable cyclase-based assay. Thus, for example, a Gz 
coupled receptor such as H9, a GPCR Fusion Protein can be established that utilizes a Gs 
fusion protein - we believe that such a fusion construct, upon expression, "drives" or "forces" 
1 o the non-endogenous GPCR to couple with, e.g., Gs rather than the "natural " Gz protein, such 
that a cyclase-based assay can be established. Thus, for Gi, Gz and Go coupled receptors, we 
prefer that that when a GPCR Fusion Protein is used and the assay is based upon detection of 
adenyl cyclase activity, that the fusion construct be established with Gs (or an equivalent G 
protein that stimulates the formation of the enzyme adenylyl cyclase). 
15 F. Medicinal Chemistry 

Generally, but not always, direct identification of candidate compounds is preferably 
conducted in conjunction with compounds generated via combinatorial chemistry techniques, 
whereby thousands of compounds are randomly prepared for such analysis. Generally, the 
results of such screening will be compounds having unique core structures; thereafter, these 
20 compounds are preferably subjected to additional chemical modification around a preferred 
core structure(s) to further enhance the medicinal properties thereof. Such techniques are 
known to those in the art and will not be addressed in detail in this patent document. 
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G. Pharmaceutical compositions 

Candidate compounds selected for further development can be formulated into 
pharmaceutical compositions using techniques well known to those in the art. Suitable 
pharmaceutically-acceptable carriers are available to those in the art: for example, see 
Remington's Pharmaceutical Sciences, 1 6 th Edition, 1 980, Mack Publishing Co., (Oslo et al., 

eds.) 

H. Other Utility 

Although a preferred use of the non-endogenous versions the human GPCRs disclosed 
herein may be for the direct identification of candidate compounds as inverse agonists, 
agonists or partial agonists (preferably for use as pharmaceutical agents), these versions of 
human GPCRs can also be utilized in research settings. For example, in vitro and in vivo 
systems incorporating GPCRs can be utilized to further elucidate and understand the roles 
these receptors play in the human condition, both normal and diseased, as well as 
understanding the role of constitutive activation as it applies to understanding the signaling 
cascade. The value in non-endogenous human GPCRs is that their utility as a research tool 
is enhanced in that, because of their unique features, non-endogenous human GPCRs can be 
used to understand the role of these receptors in the human body before the endogenous 
ligand therefor is identified. Other uses of the disclosed receptors will become apparent to 
those in the art based upon, inter alia, a review of this patent document. 

EXAMPLES 

The following examples are presented for purposes of elucidation, and not limitation, of 
the present invention. While specific nucleic acid and amino acid sequences are disclosed 
herein, those of ordinary skill in the art are credited with the ability to make minor 
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modifications to these sequences while achieving the same or substantially similar results 
reported below. The traditional approach to application or understanding of sequence 
cassettes from one sequence to another (e.g. from rat receptor to human receptor or from 
human receptor A to human receptor B) is generally predicated upon sequence alignment 

5 techniques whereby the sequences are aligned in an effort to determine areas of commonality . 
The mutational approach disclosed herein does not rely upon this approach but is instead 
based upon an algorithmic approach and a positional distance from a conserved proline 
residue located within the TM6 region of human GPCRs. Once this approach is secured, 
those in the art are credited with the ability to make minor modifications thereto to achieve 

10 substantially the same results (i.e., constitutive activation) disclosed herein. Such modified 
approaches are considered within the purview of this disclosure 
Example 1 

Endogenous Human Gpcrs 

1 . Identification of Human GPCRs 

15 Certain of the disclosed endogenous human GPCRs were identified based upon a 

review of the GenBank™ database information. While searching the database, the following 
cDNA clones were identified as evidenced below (Table C). 

TABLE C 



20 


Disclosed 
Human 
Orphan 
GPCRs 


Accession 
Number 


Complete DNA 
Sequence 
(Base Pairs) 


Open Reading 
Frame 
(Base Pairs) 


Nucleic 
Acid 

SEQ.ID. 
NO. 


Amino 
Acid 
SEQ.ID. 
NO. 




hARE-3 


AL033379 


111,389 bp 


1,260 bp 


1 


2 




hARE-4 


AC006087 


226,925 bp 


1,119 bp 


3 


4 


25 


hARE-5 


AC006255 


127,605 bp 


1,104 bp 


5 


6 




hRUP3 


AL035423 


J 40,094 bp 


1 ,005 bp 


7 


8 
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hRUP5 
hRUP6 
hRUF7 



AC005849 
AC005871 
AC007922 



-24 

169,144 bp 
218,807 bp 
158,858 bp 



1,413 bp 
1,245 bp 
1,173 bp 



9 
11 
13 



10 
12 
14 



Other disclosed endogenous human GPCRs were identified by conducting a BLAST™ 
search of EST database (dbest) using the following EST clones as query sequences. The 
following EST clones identified were then used as a probe to screen a human genomic library 



(Table D). 



TABLE D 



10 



15 



20 



Disclosed 


Query 


EST Clone/ 


Open 


Nucleic Acid 


Amino a 


Human 


(Sequence) 


Accession No. 


Reading 


SEQ.ID.NO. 


SEQ.ID. 


Orphan 
GPCRs 


Identified 


Frame 
(Base Pairs) 


17 


18 


hGPCR27 


Mouse 


AA775870 


1,125 bp 




GPCR27 








20 


hARE-1 


TDAG 


1689643 
A1090920 


999 bp 


19 


hARE-2 


GPCR27 


68530 
AA359504 


1,122 bp 


21 


22 


hPPRl 


Bovine 


238667 


1,053 bp 


23 


24 




PPR1 


H67224 






26 


hG2A 


Mouse 


See Example 2(a), 


1,113 bp 


25 




1 179426 


below 






28 


hCHN3 


N.A. 


EST 36581 


U13 bp 


27 






(full length) 






30 


hCHN4 


TDAG 


1184934 
AA80453 1 


1 ,077 bp 


29 


hCHN6 


NA. 


EST 2134670 


1,503 bp 


31 


32 






(full length) 






34 


hCHN8 


KI AA000 1 


EST 764455 


1,029 bp 


«■* -i 


hCHN 9 


1365839 


EST 1541536 


1,077 bp 


35 


36 


hCHNIO 


Mouse EST 


Human 1365839 


1 ,005 bp 


37 


38 




1365839 








40 


hRUP4 


NA. 


AI307658 


1,296 bp 


39 



25 



N.A. = "not applicable" 



2. Full Length Cloning 



a. Human G2A 

Mouse EST clone 1 1 79426 was used to obtain a human genomic clone containing all 
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but three amino acid G2A coding sequences. The 5'of this coding sequence was obtained by 
using 5'RACE. and the template for PCR was Clontech's Human Spleen Marathon-Ready™ 
cDNA. The disclosed human G2A was amplified by PCR using the G2A cDNA specific 
primers for the first and second round PCR as shown in SEQ.ID.NO.: 4 1 and SEQ.ID.NO. :42 
5 as follows: 

S'-CTGTGTACAGCAGTTCGCAGAGTG^ (SEQ.ID.NO.: 41; 1" round PCR) 

5'-GAGTGCCAGGCAGAGCAGGTAGAC-3 ' (SEQ.ID.NO.: 42; second round PCR). 

PCR was performed using Advantage GC Polymerase Kit (Clontech; manufacturing 

instructions will be followed), at 94°C for 30 sec followed by 5 cycles of 94 °C for 5 sec and 
10 72 °C for 4 min; and 30 cycles of 94° for 5 sec and 70° for 4 min. An approximate 1 .3 Kb 
PCR fragment was purified from agarose gel, digested with Hind III and Xba 1 and cloned into 
the expression vector pRC/CMV2 (Invitrogen). The cloned-insert was sequenced using the 
T7 Sequenase™ kit (USB Amersham; manufacturer instructions followed) and the sequence 
was compared with the presented sequence. Expression of the human G2A was detected by 
1 5 probing an RNA dot blot (Clontech; manufacturer instructions followed) with the P 32 -labeled 
fragment. 

b. CHN9 

Sequencing of the EST clone 1541536 showed CHN9 to be a partial cDNA clone 
having only an initiation codon; i.e., the termination codon was missing. When CHN9 
20 was used to blast against data base (nr), the 3' sequence of CHN9 was 1 00% homologous 
to the 5' untranslated region of the leukotriene B4 receptor cDNA, which contained a 
termination codon in the frame with CHN9 coding sequence. To determine whether the 5* 
untranslated region of LTB4R cDNA was the 3' sequence of CHN9, PCR was performed 
using primers based upon the 5* sequence flanking the initiation codon found in CHN9 and 
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the 3' sequence around the termination codon found in the LTB4R 5' untranslated region. 
The 5' primer sequence utilized was as follows: 

5--CCCGAA1TCCTGCTTGCTCCCAGCTTGGCCC-3' (SEQ.ID.NO.: 43; sense) and 
5'-TGTGGATCCTGCTGTCAAAGGTCCCATTCCGG-3' (SEQ.ID.NO.: 44; antisense). 

PCR was performed using thymus cDNA as a template and rTth polymerase (Perkin Elmer) 
with the buffer system provided by the manufacturer, 0.25 uM of each primer, and 0.2 mM 
of each 4 nucleotides. The cycle condition was 30 cycles of 94°C for 1 min, 65°C for lmin 
and 72 °C for 1 min and 10 sec. A 1.1 kb fragment consistent with the predicted size was 
obtained from PCR. This PCR fragment was subcloned into pCMV (see below) and 
sequenced (see, SEQ.ID.NO.: 35). 

c. RUP4 

The full length RUP4 was cloned by RT-PCR with human brain cDN A (Clontech) as 
templates: 

5'-TCACAATGCTAGGTGTGGTC-3 ' (SEQ.ID.NO.: 45; sense) and 
5'-TGCATAGACAATGGGATTACAG-3' (SEQ.ID.NO.: 46; antisense). 

PCR was performed using TaqPlus Precision™ polymerase (Stratagene; manufacturing 
instructions followed) by the following cycles: 94 °C for 2 min; 94 °C 30 sec; 55 °C for 30 sec, 
72°C for 45 sec, and 72°C for 10 min. Cycles 2 through 4 were repeated 30 times. 

The PCR products were separated on a 1% agarose gel and a 500 bp PCR fragment 
isolated and cloned into the pCRII-TOPO™ vector (Invitrogen) and sequenced using the 
T7 DNA Sequenase™ kit ( Amsham) and the SP6/T7 primers (Stratagene). Sequence analysis 
revealed that the PCR fragment was indeed an alternatively spliced form of AI307658 having 
a continuous open reading frame with similarity to other GPCRs. The completed sequence 
of this PCR fragment was as follows: 



was 
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5 ' -TCACA ATGCTAGGTGTGGTCTGGCTGGTGGCAGTCATCGTAGGATCACCCATGTGGCAC 

GTGCAACAACTTGAGATCAAATATGACTTCCTATATGAAAAGGAACACATCTGCTGCTTAAGA 

GTGGACCAGCCCTGTGCACCAGAAGATCTACACCACCTTCATCCTTGTCATCCTCTTCCTCCTGC 

CTCTTATGGTGATGCTTATTCTGTACGTAAAATTGGTTATGAACTTTGGATAAAGAAAAGAGTT 

GGGGATGGTTCAGTGCTTCGAACTATTCATGGAAAAGAAATGTCCAAAATAGCCAGGAAGAAG 

AAACGAGCTGTCATTATGATGGTGACAGTGGTGGCTCTCTTTGCTGTGTGCTGGGCACCATTCC 

ATGTTGTCCATATGATGATTGAATACAGTAATTTTGAAAAGGAATATGATGATGTCACAATCAA 

GATGATTTTTGCTATCGTGCAAATTATTGGATTTTCCAACTCCATCTGTAATCCCAT^ 

3* (SEQ.ID.NO.: 47) 

Based on the above sequence, two sense oligonucleotide primer sets: 

5 T -CTGCTTAGAAGAGTGGACCAG-3' (SEQ.ID.NO-: 48; oligo 1), 

5 •-CTGTGCACCAGAAGATCTACAC-3* (SEQ.IDNO.: 49; oligo 2) and 

two antisense oligonucleotide primer sets: 

5*-CAAGGATGAAGGTGGTGTAGA-3* (SEQ.ID.NO.: 50; oligo 3) 
5 '-GTGTAG ATCTTCTGGTGCACAGG-3 ' (SEQ.ID.NO.: 51; oligo 4) 

were used for 3'- and 5 '-RACE PCR with a human brain Marathon-Ready™ cDNA 
(Clontech, Cat# 7400-1) as template, according to manufacture's instructions, DNA 
fragments generated by the RACE PCR were cloned into the pCRII-TOPO™ vector 
(Invitrogen) and sequenced using the SP6/T7 primers (Stratagene) and some internal primers. 
The 3' RACE product contained a poly(A) tail and a completed open reading frame ending 
at a TAA stop codon. The 5' RACE product contained an incomplete 5' end: i.e., the ATG 
initiation codon was not present. 

Based on the new 5' sequence, oligo 3 and the following primer: 
5 ' -GCA ATGCAGGTCATAGTGAGC -3* (SEQ.ID.NO.: 52; oligo 5) 

were used for the second round of 5' race PCR and the PCR products were analyzed as above. 
A third round of 5' race PCR was carried out utilizing antisense primers: 
5'-TGGAGCATGGTGACGGGAATGCAGAAG-3' (SEQ.ID.NO.: 53: oligo 6) and 
5 '-GTG ATG AGCAGGTCACTG AGCGCCAAG-3 * (SEQ.ID.NO.: 54: oligo7). 

The sequence of the 5' RACE PCR products revealed the presence of the initiation codon 



WO 00/22131 PCT/US99/24065 

-28- 

ATG, and further round of 5' race PCR did not generate any more 5* sequence. The 
completed 5 ' sequence was confirmed by RT-PCR using sense primer 
5 *-GC A ATGCAGGCGCTTAACATTAC-3 1 (SEQ.ID.NO.: 55; oligo 8) 

and oligo 4 as primers and sequence analysis of the 650 bp PCR product generated from 
human brain and heart cDNA templates (Clontech, Cat# 7404-1 ). The completed 3 ' sequence 
was confirmed by RT-PCR using oligo 2 and the following antisense primer: 

5*-TTGGGTTACAATCTGAAGGGCA-3* (SEQ.ID.NO.:56; oligo 9) 

and sequence analysis of the 670 bp PCR product generated from human brain and heart 
cDNA templates. (Clontech, Cat# 7404-1 ). 

d. RUP5 

The full length RUP5 was cloned by RT-PCR using a sense primer upstream from 
ATG, the initiation codon (SEQ.ID.NO.:57)> and an antisense primer containing TCA as the 
stop codon (SEQ.ID.NO.:58), which had the following sequences: 

5 '-ACTCCGTGTCCAGCAGGACTCTG-3 ' (SEQ.ID.NO.: 57) 
5 ' -TG CGTG TTCCTGG ACCCTCACGTG -3 ' (SEQ.ID.NO.: 58) 

and human peripheral leukocyte cDNA (Clontech) as a template. Advantage™ cDNA 
polymerase (Clontech) was used for the amplification in a 50ul reaction by the following cycle 
with step 2 through step 4 repeated 30 times: 94°C for 30 sec; 94° for 1 5 sec; 69° for 40 sec; 
72°C for 3 min; and 72°C fro 6 min. A 1.4kb PCR fragment was isolated and cloned with 
the pCRII-TOPO™ vector (Invitrogen) and completely sequenced using the T7 DNA 
Sequenase™ kit (Amsham). See, SEQ.ID.NO.: 9. 

e. RUP6 

The full length RUP6 was cloned by RT-PCR using primers: 
5'-CAGGCCTTGGATTTTAATGTCAGGGATGG-3' (SEQ.ID.NO.: 59) and 
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5 *-GG AG AGTCAGCTCTG AAAGAATTCAGG-3 ' (SEQ.ID.NO.: 60); 

and human thymus Marathon-Ready™ cDNA (Clontech) as a template. Advantage cDNA 
polymerase (Clontech, according to manufacturer's instructions) was used for the 
amplification in a 50ul reaction by the following cycle: 94°C for 30sec; 94°C for 5 sec; 66°C 
for 40sec; 72 °C for 2.5 sec and 72 °C for 7 min. Cycles 2 through 4 were repeated 30 times. 
A 1 .3 Kb PCR fragment was isolated and cloned into the pCRH-TOPO™ vector (Invitrogen) 
and completely sequenced (see, SEQ.ID.NO.: 1 1) using the ABI Big Dye Terminator™ kit 
(P.E. Biosystem). 

f. RUP7 

The full length RUP7 was cloned by RT-PCR using primers: 
5'-TGATGTGATGCCAGATACTAATAGCAC-3* (SEQ.ID.NO.: 61; sense) and 
5'-CCTGATTCATTTAGGTGAGATTGAGAC-3' (SEQ.ID.NO.: 62; antiscnse) 

and human peripheral leukocyte cDNA (Clontech) as a template. Advantage™ cDNA 
polymerase (Clontech) was used for the amplification in a 50 ul reaction by the following 
cycle with step 2 to step 4 repeated 30 times: 94°C for 2 minutes; 94°C for 1 5 seconds; 60°C 
for 20 seconds; 72°C for 2 minutes; 72°C for 10 minutes. A 1.25 Kb PCR fragment was 
lated and cloned into the pCRII-TOPO™ vector (Invitrogen) and completely sequenced 



ISO 



using the ABI Big Dye Terminator™ kit (P.E. Biosystem). See, SEQ.ID.NO.: 13. 
3. Angiotensin II Type 1 Receptor ("ATI") 
The endogenous human angiotensin II type 1 receptor ("ATI ") was obtained by PCR 
using genomic DNA as template and rTth polymerase (Perkin Elmer) with the buffer system 
provided by the manufacturer, 0.25 uM of each primer, and 0.2 mM of each 4 nucleotides. 
The cycle condition was 30 cycles of 94°C for 1 min, 55°C for lmin and 72 °C for 1 .5 min. 
The 5' PCR primer contains a Hindlll site with the sequence: 
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5'-CCCAAGCTTCCCCAGGTGTATTTGAT-3' (SEQ.1D.NO.: 63) 

and the 3 ' primer contains a BamHI site with the following sequence: 

5' -GTTGG ATCCACATA ATGCATTTTCTC-3 ' (SEQ.ID.NO.: 64). 

The resulting 1.3 kb PCR fragment was digested with Hindlll and BamHI and cloned into 
HindlH-BamHI site of pCMV expression vector. The cDNA clone was fully sequenced. 
Nucleic acid (SEQ.ID.NO.: 65) and amino acid (SEQ.ID.NO.: 66) sequences for human ATI 



were 



thereafter determined and verified. 
4. GPR38 

To obtain GPR38, PCR was performed by combining two PCR fragments, using 
1 0 human genomic cDNA as template and rTth poymerase (Perkin Elmer) with the buffer system 
provided by the manufacturer, 0.25uM of each primer, and 0.2 mM of each 4 nucleotides. 
The cycle condition for each PCR reaction was 30 cycles of 94 °C for 1 min, 62 °C for lmin 
and 72° C for 2 min. 

The first fragment was amplified with the 5' PCR primer that contained an end site 

1 5 with the following sequence: 

5'-ACCATGGGCAGCCCCTGGAACGGCAGC-3' (SEQ.lD.NO.:67) 
and a 3 ' primer having the following sequence: 

5'-AGAACCACCACCAGCAGGACGCGGACGGTCTGCCGGTGG-3' (SEQ.ID.NO.:68). 
The second PCR fragment was amplified with a 5' primer having the following sequence: 
20 5'-GTCCGCGTCCTGCTGGTGGTGGTTCTGGCATTTATAATT-3' (SEQ.ID.NO.: 69) 

and a 3 ' primer that contained a BamHI site and having the following sequence: 
5'-CCTGGATCCTTATCCCATCGTCTTCACGTTAGC-3' (SEQ.ID.NO.: 70). 

The two fragments were used as templates to amplify GPR38. using SEQ.ID.NO.: 67 and 
SEQ.ID.NO.: 70 as primers (using the above-noted cycle conditions). The resulting 1 .44kb 
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PCR fragment was digested with BamHI and cloned into Blunt-BamHI site of pCMV 

expression vector. 

5. MC4 

To obtain MC4, PCR was performed using human genomic cDNA as template and 
rTth poymerase (Perkin Elmer) with the buffer system provided by the manufacturer.. 0.25uM 
of each primer, and 0.2 mM of each 4 nucleotides. The cycle condition for each PCR reaction 
was 30 cycles of 94°C for 1 min, 54°C for lmin and 72°C for 1 .5 min. 
The 5' PCR contained an EcoRI site with the sequence: 
5 "-CTGG AATTCTCCTGCCAGCATGGTGA-3* (SEQ.ID.NO.: 71) 
and the 3 ' primer contained a BamHI site with the sequence: 
5'-GCAGGATCCTATATTGCGTGCTCTGTCCCC'-3 (SEQ.ID.NO.: 72). 

The 1 .0 kb PCR fragment was digest with EcoRI and BamHI and cloned into EcoRI-BamHI 
site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 73) and amino acid 
(SEQ.ID.NO.: 74) sequences for human MC4 were thereafter determined. 

6. CCKB 

To obtain CCKB, PCR. was performed using human stomach cDNA as template and 
rTth poymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25uM 
of each primer, and 0.2 mM of each 4 nucleotides. The cycle condition for each PCR reaction 
was 30 cycles of 94°C for 1 min, 65°C for lmin and 72°C for 1 min and 30 sec. 
The 5' PCR contained a Hindlll site with the sequence: 

5'-CCGAAGCTTCGAGCTGAGTAAGGCGGCGGGCT-3' (SEQ.ID.NO.: 75) 
and the 3' primer contained an EcoRI site with the sequence: 
5--GTGGAATTCATTTGCCCTGCCTCAACCCCCA-3 (SEQ.ID.NO.: 76). 

The resulting 1.44 kb PCR fragment was digest with Hindlll and EcoRI and cloned into 
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Hindlll-EcoRI site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 77) and amino 
acid (SEQ.ID.NO.: 78) sequences for human CCKB were thereafter determined. 

7. TDAG8 

To obtain TDAG8, PCR was performed using genomic DNA as template and rTth 
polymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 uM of 
each primer, and 0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of 94°C 
for 1 min, 56°C for lmin and 72 °C for 1 min and 20 sec. The 5' PCR primer contained a 
Hindlll site with the following sequence: 

5'-TGCAAGCTTAAAAAGGAAAAAATGAACAGC-3' (SEQ.ID.NO.: 79) 

and the 3' primer contained a BamHI site with the following sequence: 

5 ' -TAAG G ATCCCTTCCCTTC A AAAC ATCCTTG -3' (SEQ.ID.NO.: 80). 

The resulting 1.1 kb PCR fragment was digested with Hindlll and BamHI and cloned into 
Huidlll-BamHI site of pCMV expression vector. Three resulting clones sequenced contained 
three potential polymorphisms involving changes of amino acid 43 from Pro to Ala, amino 
acid 97 from Lys to Asn and amino acid 1 30 from He to Phe. Nucleic acid (SEQ.ID.NO.: 8 1 ) 
and amino acid (SEQ.ID.NO.: 82) sequences for human TDAG8 were thereafter determined. 

8. H9 

To obtain H9, PCR was performed using pituitary cDNA as template and rTth 
polymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 yiM of 
each primer, and 0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of 94°C 
for 1 min, 62°C for 1 min and 72°C for 2 min. The 5' PCR primer contained a Hindlll site 
with the following sequence: 

5*-GGAAAGCTTAACGATCCCCAGGAGCAACAT-3 " (SEQ.ID.NO. : 1 5) 
and the 3' primer contained a BamHI site with the following sequence: 
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5*-CTGGGATCCTACGAGAGCATTTTTCACACAG-3' (SEQ.ID.NO.: 1 6). 

The resulting 1 9 kb PCR fragment was digested with Hindlll and BamHl and cloned into 
Hindlll-BamHI site of pCMV expression vector. H9 contained three potential polymorphisms 
involving changes of amino acid P320S, S493N and amino acid G448A. Nucleic acid 
(SEQ.ID.NO.: 139) and amino acid (SEQ.ID.NO.: 140) sequences for human H9 were 
thereafter determined and verified. 

P^paration of non-Endogenous, Constitutively Activated Gpcrs 

Those skilled in the art are credited with the ability to select techniques for 
mutation of a nucleic acid sequence. Presented below are approaches utilized to create 
non-endogenous versions of several of the human GPCRs disclosed above. The mutations 
disclosed below are based upon an algorithmic approach whereby the 16 th amino acid 
(located in the IC3 region of the GPCR) from a conserved proline residue (located in the 
TM6 region of the GPCR, near the TM6/IC3 interface) is mutated, most preferably to a 

lysine amino acid residue. 

1. Tranformer Site-Directed ™ Mutagenesis 

Preparation of non-endogenous human GPCRs may be accomplished on human 
GPCRs using Transformer Site-Directed™ Mutagenesis Kit (Clontech) according to the 
manufacturer instructions. Two mutagenesis primers are utilized, most preferably a lysine 
mutagenesis oligonucleotide that creates the lysine mutation, and a selection marker 
oligonucleotide. For convenience, the codon mutation to be incorporated into the human 
GPCR is also noted, in standard form (Table E): 
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TABLE E 



Receptor Identifier 


Codon Mutation 


hARE-3 


F313K 


hARE-4 


V233K 


hARE-5 


A240K 


hGPCR14 


L257K 


hGPCR27 


C283K 


hARE-1 


E232K 


hARE-2 


G285K 


hPPRl 


L239K 


hG2A 


K232A 


hRUP3 


L224K 


hRUPS 


A236K 


hRUP6 


N267K 


hRUP7 


A302K 


hCHN4 


V236K 


hMC4 


A244K 


hCHN3 


S284K 


hCHN6 


L352K 


hCHN8 


N235K 


hCHN9 


G223K 


hCHNIO 


L231K 


hH9 


F236K 



The following GPCRs were mutated according with the above method using the 



designated sequence primers (Table F). 
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Receptor 
Identifier 



hRUP4 



hATI 
hGPR38 

hCCKB 
hTDAG8 



hH9 
hMC4 



Codon 
Mutation 



V272K 



see below 
V297K 

V332K 
I225K 



F236K 
A244K 



-35- 

TABLE F 

Lysine Mutagenesis 
(SEQ.ID.NO.) 
5'-3' orientation, mutation 
sequence underlined 

CAGGAAGAAG AAA CGAGC 
TGTCATTATGATGGTGACA 
GTG (83) 

alternative approach; see below 
GGCCACCGGCAGACCAAAC 
GCGTCCTGCTG (85) 
alternative approach; see below 
GGAAAAGAAGAGAATCAA 
AAAACTACTTGTCAGCATC 

(87) 

GCTGAGGTTCGCAATAAAC 
TAACCATGTTTGTG (143) 
GCCAATATGAAGGGAAAA 
ATTACCTTGACCATC (137) 



Selection Marker 

(SEQ.ID.NO.) 
5'-3' orientation 



CACTGTCACCATCATAATG 
A C A G CTCG TTTCTTCTTCC 
TG (84) 

alternative approach; see below 
CTCCTTCGGTCCTCCTATC 
GTTGTCAGAAGT (86) 
alternative approach; see below 
CTCCTTCGGTCCTCCTATC 
GTTGTCAGAAGT (88) 

CTCCTTCGGTCCTCCTATC 
GTTGTCAGAAGT ( 1 44) 
CTCCTTCGGTCCTCCTATC 
GTTGTCAGAAGT (138) 



The non-endogenous human GPCRs were then sequenced and the derived and 
verified nucleic acid and amino acid sequences are listed in the accompanying "Sequence 
Listing" appendix to this patent document, as summarized in Table G below: 

TABLE G 



15 



20 



25 



30 



Non Endogenous Human 
GPCR 
hRUP4 
(V272K) 
hATI 

(see alternative approaches 
below) 
hGPR38 
(V297K) 
hCCKB 
(V332K) 
HTDAG8 
(I225K) 

hH9 
(F236K) 
hMC4 
(A244K) 



Nucleic Acid Sequence Listing 

SEQ.ID.NO.: 127 

(see alternative approaches 
below) 

SEQ.ID.NO.: 129 

SEQ.ID.NO.: 131 

SEQ.ID.NO.: 133 

SEQ.ID.NO.: 141 

SEQ.ID.NO.: 135 



Amino Acid Sequence 
Listing 

SEQ.ID.NO.: 128 

(see alternative approaches, 
below) 

SEQ.ID.NO.: 130 

SEQ.ID.NO.: 132 

SEQ.ID.NO.: 134 

SEQ.ID.NO.: 142 

SEQ.ID.NO.: 136 



4 
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2. Alternative Approaches For Creation of 
Non-Endogenous Human GPCRs 

a. ATI 

1. F239K Mutation 

Preparation of a non-endogenous, constitutive! y activated human ATI receptor was 
accomplished by creating an F239K mutation (see, SEQ.ID.NO.: 89 for nucleic acid sequence 
and SEQ.ID.NO.: 90 for amino acid sequence). Mutagenesis was performed using 
Transformer Site-Directed Mutagenesis™ Kit (Clontech) according to the to manufacturer's 
instructions. The two mutagenesis primers were used, a lysine mutagenesis oligonucleotide 
(SEQ.ID.NO.: 91) and a selection marker oligonucleotide (SEQ.ID.NO.: 92), which had the 
following sequences: 

S'-CCAAGAAATGATGATATTAAAAAGATAATTATGGCo* (SEQ.ID.NO.: 91) 
S'-CTCCTTCGGTCCTCCTATCGTTGTCAGAAGT-S* (SEQJD.NO.: 92), 

respectively. 

2. Nil 1 A Mutation 

Preparation of a non-endogenous human ATI receptor was also accomplished by 
creating an Nl 1 1 A mutation (see, SEQ.ID.NO.:93 for nucleic acid sequence, and 
SEQ.ID.NO.: 94 for amino acid sequence). Two PCR reactions were performed using pfii 
polymerase (Stratagene) with the buffer system provided by the manufacturer, 
supplemented with 10% DMSO, 0.25 uM of each primer, and 0.5 mM of each 4 
nucleotides. The 5' PCR sense primer used had the following sequence: 
5'-CCCAAGCTTCCCCAGGTGTATTTGAT-3* (SEQ.ID.NO.: 95) 
and the antisense primer had the following sequence: 
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5'-CCTGCAGGCGAAACTGACTCTGGCTGAAG-3* (SEQ.ID.NO.: 96). 

The resulting 400 bp PCR fragment was digested with HindHI site and subcloned into 
HindlH-Smal site of pCMV vector (5' construct). The 3' PCR sense primer used had the 
following sequence: 

5 '-CTGTACGCTAGTGTGTTTCTACTCACGTGTCTCAGCATTGAT-3 ' (SEQ.ID.NO.: 97) 

and the antisense primer had the following sequence: 
5 '-GTTGG ATCC ACATAATGCATTTTCTC-3 ' (SEQ.ID.NO.: 98) 

The resulting 880 bp PCR fragment was digested with BamHl and inserted into Pst 
(blunted by T4 polymerase) and BamHI site of 5 : construct to generated the full length 
N 1 1 1 A construct. The cycle condition was 25 cycles of 94°C for 1 min, 60°C for lmin 
and 72 °C for 1 min (5' PCR) or 1.5 min (3' PCR). 

3. AT2K255IC3 Mutation 
Preparation of a non-endogenous, constitutively activated human ATI was 
mplished by creating an AT2K255IC3 "domain swap" mutation (see, SEQ.ID.NO.:99 
for nucleic acid sequence, and SEQ.ID.KO.: 100 for amino acid sequence). Restriction 
sites flanking IC3 of ATI were generated to facilitate replacement of the IC3 with 

rresponding IC3 from angiotensin II type 2 receptor (AT2). This was accomplished by 
performing two PCR reactions. A 5' PCR fragment (Fragment A) encoded from the 5' 
untranslated region to the beginning of IC3 was generated by utilizing SEQ.ID.NO.: 63 as 
sense primer and the following sequence: 

5'-TCCGAATTCCAAAATAACTTGTAAGAATGATCAGAAA-3' (SEQ.ID.NO.: 101) 

as antisense primer. A 3' PCR fragment (Fragment B) encoding from the end of IC3 to the 
3' untranslated region was generated by using the following sequence: 
5'-AGATCTTAAGAAGATAATTATGGCAATTGTGCT-3' (SEQ.ID.NO.: 102) 



acco 



co 
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as sense primer and SEQ.ID.NO.: 64 as antisense primer. The PCR condition was 30 
cycles of 94°C for 1 min, 55°C for Imin and 72 °C for 1 .5 min using endogenous ATI 
cDNA clone as template and pfu polymerase (Stratagene), with the buffer systems 
provided by the manufacturer, supplemented with 10% DMSO, 0.25 of each primer, 
and 0.5 mM of each 4 nucleotides. Fragment A (720 bp) was digested with Hindlll and 
EcoRI and subcloned. Fragment B was digested with BamHI and subcloned into pCMV 
vector with an EcoRI site 5* to the cloned PCR fragment. 

The DNA fragment (Fragment C) encoding IC3 of AT2 with a L255K mutation 
and containing an EcoRI cohesive end at 5' and a Aflll cohesive end at 3 \ was generated 

by annealing 2 synthetic oligonucleotides having the following sequences: 

S'AATTCGAAAACACTTACTGAAGACGAATAGCTATGGGAAGAACAGGATAACCCGTGACCAA 
G-3* (sense; SEQ.ID.NO.: 103) 

5 'TTAACTTG GTCACGGGTTATCCTGTTCTT.CCCATAG CTATTCGTCTTC AGT 
AAGTGTTTTCG-3 * (antisense; SEQ.ID.NO.: 104). 

Fragment C was inserted in front of Fragment B through EcoRI and Aflll site. The 
resulting clone was then ligated with the Fragment A through the EcoRI site to generate ATI 
with AT2K255IC3. 



4. A243+ Mutation 

Preparation of a non-endogenous human ATI receptor was also accomplished by 
creating an A243+ mutation (see, SEQ.ID.NO.: 105 for nucleic acid sequence, and 
SEQ.ID.NO.: 106 for amino acid sequence). An A243+ mutation was constructed using the 
following PCR based strategy: Two PCR reactions was performed using pfu polymerase 
(Stratagene) with the buffer system provided by the manufacturer supplemented with 1 0% 
DMSO. 0.25 nM of each primer, and 0.5 mM of each 4 nucleotides. The 5' PCR sense primer 
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utilized had the following sequence: 

S'-CCCAAGCTTCCCCAGGTGTATTTGAT-S' (SEQ.ID.NO.: 107) 
and the antisense primer had the following sequence: 

5 ' - AAGC A CA ATTG CTGCATAATTATCTTAAA A ATATC ATC-3 ' (SEQ.ID.NO.: 108). 

The 3' PCR sense primer utilized had the following sequence: 

5 *-AAG ATAATTATGGCAGCAATTGTGCITTTC^ * (SEQ.ID.NO.: 109) 

containing the Ala insertion and antisense primer: 

5'-GTTGG ATCCACATAATGCATTTTCTC-3 , (SEQ.ID.NO.: 110). 

The cycle condition was 25 cycles of 94°C for 1 min, 54°C for 1 min and 72 °C for 1 .5 min. 
An aliquot of the 5' and 3* PCR were then used as co-template to perform secondary PCR 
using the 5' PCR sense primer and 3' PCR antisense primer. The PCR condition was the 
same as primary PCR except the extention time was 2.5 min. The resulting PCR fragment 



was 



digested with Hindlll and BamHI and subcloned into pCMV vector. (See, 



SEQ.ID.NO.: 105) 

4. CCKB 

Preparation of the non-endogenous, constitutively activated human CCKB receptor 
was accomplished by creating a V322K mutation (see, SEQ.ID.NO.: 1 1 1 for nucleic acid 
sequence and SEQ.ID.NO.: 1 12 for amino acid sequence). Mutagenesis was performed by 
PCR via amplification using the wildtype CCKB from Example 1. 

The first PCR fragment (ikb) was amplified by using SEQ.ID.NO.: 75 and an 
antisense primer comprising a V322K mutation: 

5'-CAGCAGCATGCGCTTCACGCGCTTCTTAGCCCAG-3' (SEQ.ID.NO.: 1 13). 

The second PCR fragment (0.44kb) was amplified by using a sense primer comprising the 

V322K mutation: 
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5'-AGAAGCGCGTGAAGCGCATGCTGCTGGTGATCGTT-3- (SEQ.ID.NO.: 1 14) and SEQ.ID.NO.: 
76. 

The two resulting PCR fragments were then used as template for amplifying CCKB 
comprising V332K, using SEQ.ID.NO.: 75 and SEQ.ID.NO.: 76 and the above-noted 
5 system and conditions. The resulting 1 .44kb PCR fragment containing the V3 32K 
mutation was digested with Hindlll and EcoRI and cloned into Hindlll-EcoRI site of 
pCMV expression vector. (See, SEQ.ID.NO.: 111). 

3. QuikChange™ Site-Directed™ Mutagenesis 

Preparation of non-endogenous human GPCRs can also be accomplished by using 
10 QuikChange™ Site-Directed™ Mutagenesis Kit (Stratagene, according to manufacturer's 
instructions). Endogenous GPCR is preferably used as a template and two mutagenesis 
primers utilized, as well as, most preferably, a lysine mutagenesis oligonucleotide and a 
selection marker oligonucleotide (included in kit). For convenience, the codon mutation 
rporated into the human GPCR and the respective oligonucleotides are noted, in standard 



inco 



15 form (Table H): 
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TABLE H 



Receptor 
Identifier 



Codon 
Mutation 



hCHN3 

hCHN6 
hCHN8 
hCHN9 
hCHNlO 



S284K 

L352K 
N235K 
G223K 
L231K 



Lysine Mutagenesis 
(SEQ.ID.NO.) 
5'-3' orientation, mutation 
underlined 

ATGGAGAAAAGAATCAAAAGAA 
TGTTCTATATA (115) 

CGCTCTCTGGCCTTGAAGCGCAC 
GCTCAGC(117) 

CCCAGGAAAAAGGTGAAAGTCA 

AAGTTTTC (119) 
GGGGCGCGGGTGAAACGGCTGG 

TGAGC(121) 

CCCCTTGAAAAGCCTAAGAACTT 
GGTCATC(123) 



Selection Marker 
(SEQ.ID.NO.) 
5'-3* orientation 



TATATAGAACATTCTTTT 

GATTCTTTTCTCCAT 

(116) 

GCTGAGCGTGCGCTTCA 
AGGCCAGAGAGCG (118) 
G A A A ACTTTG ACTTTC AC 
CT1T1 TCCTGGG (120) 
GCTCACCAGCCGTTTCA 
CCCGCGCCCC(122) 
GATGACCAAGTTCTTAG 
GCTTTTCAAGGGG (124) 
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Example 3 

Receptor Expression 

Although a variety of cells are available to the art for the expression of proteins, it is 
most preferred that mammalian cells be utilized. The primary reason for this is predicated 
upon practicalities, i.e., utilization of, e.g., yeast cells for the expression of a GPCR, while 
possible, introduces into the protocol a non-mammalian cell which may not (indeed, in the 
case of yeast, does not) include the receptor-coupling, genetic-mechanism and secretary 
15 pathways that have evolved for mammalian systems - thus, results obtained in non- 
mammalian cells, while of potential use, are not as preferred as that obtained from mammalian 
cells. Of the mammalian cells, COS-7, 293 and 293T cells are particularly preferred, although 
the specific mammalian cell utilized can be predicated upon the particular needs of the artisan. 

On day one, 1X10 7 293T cells per 1 50mm plate were plated out. On day two, two 
reaction tubes were prepared (the proportions to follow for each tube are per plate): tube A 
was prepared by mixing 20ng DNA (e.g., pCMV vector; pCMV vector with receptor 
cDNA, etc.) in 1.2ml serum free DMEM (Irvine Scientific, Irvine, CA); tube B was 



20 
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prepared by mixing 120ul Upofectamine (Gibco BRL) in 1.2ml serum free DMEM. Tubes 
A and B were admixed by inversions (several times), followed by incubation at room 
temperature for 30-45min. The admixture is referred to as the "transfection mixture". 
Plated 293T cells were washed with 1XPBS, followed by addition of 10ml serum free 
DMEM: 2.4ml of the transfection mixture were added to the cells, followed by incubation 
for 4hrs at 37°C/5% C0 2 . The transfection mixture was removed by aspiration, followed 
by the addition of 25ml of DMEM/10% Fetal Bovine Serum. Cells were incubated at 
37°C/5% CO,. After 72hr incubation, cells were harvested and utilized for analysis. 
Example 4 

ASSAYS FOR DETERMINATION OF CONSTITUTIVE ACTIVITY 

of Non-Endogenous GPCRs 

A variety of approaches are available for assessment of constitutive activity of the 
dogenous human GPCRs. The following are illustrative; those of ordinary skill in 
the art are credited with the ability to determine those techniques that are preferentially 
beneficial for the needs of the artisan. 

1 . Membrane Binding Assays: [ 3s S]GTPyS Assay 

When a G protein-coupled receptor is in its active state, either as a result of ligand 
binding or constitutive activation, the receptor couples to a G protein and stimulates the 
release of GDP and subsequent binding of GTP to the G protein. The alpha subunit of the G 
protein-receptor complex acts as a GTPase and slowly hydrolyzes the GTP to GDP, at which 
point the receptor normally is deactivated. Constitutively activated receptors continue to 
exchange GDP for GTP. The non-hydrolyzable GTP analog, [ 35 S]GTP r S.. can be utilized to 
demonstrate enhanced binding of [ 35 S]GTP T S to membranes expressing constitutively 



non-en 



activated receptors. The advantage of using [ 35 S]GTP 7 S binding to measure constitutive 
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activation is that: (a) it is generically applicable to all G protein-coupled receptors; (b) it is 
proximal at the membrane surface making it less likely to pick-up molecules which affect the 

intracellular cascade. 

The assay utilizes the ability of G protein coupled receptors to stimulate [ 35 S]GTPyS 
5 binding to membranes expressing the relevant receptors. The assay can, therefore, be used in 
the direct identification method to screen candidate compounds to known, orphan and 
constitutively activated G protein-coupled receptors. The assay is generic and has application 
to drug discovery at all G protein-coupled receptors. 

The [ 35 S]GTPyS assay can be incubated in 20 mM HEPES and between 1 and about 
10 20mM MgCl 2 (this amount can be adjusted for optimization of results, although 20mM is 
preferred) pH 7.4, binding buffer with between about 0.3 and about 1 ,2 nM [ 35 S]GTPyS (this 
amount can be adjusted for optimization of results, although 1 .2 is preferred ) and 12.5 to 75 
^ig membrane protein (e.g. COS-7 cells expressing the receptor; this amount can be adjusted 
for optimization, although 75jag is preferred) and 1 |iM GDP (this amount can be changed for 
15 optimization) for 1 hour. Wheatgerm agglutinin beads (25 \x\; Amersham) should then be 
added and the mixture incubated for another 30 minutes at room temperature. The tubes are 
then centrifuged at 1500 x g for 5 minutes at room temperature and then counted in a 
scintillation counter. 

A less costly but equally applicable alternative has been identified which also meets 
20 the needs of large scale screening. Flash plates™ and Wallac™ scintistrips may be utilized 
to format a high throughput [ 35 S]GTPyS binding assay. Furthermore, using this technique, 
the assay can be utilized for known GPCRs to simultaneously monitor tritiated ligand binding 
to the receptor at the same time as monitoring the efficacy via [ 35 S]GTPyS binding. This is 



10 
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possible because the Wallac beta counter can switch energy windows to look at both tritium 
and 35 S-labeled probes. This assay may also be used to detect other types of membrane 
activation events resulting in receptor activation. For example, the assay may be used to 
monitor 32 P phosphorylation of a variety of receptors (both G protein coupled and tyrosine 
kinase receptors). When the membranes are centrifuged to the bottom of the well, the bound 
[ 33 S]GTPyS or the 32 P-phosphorylated receptor will activate the scintillant which is coated of 
the wells. Scinti® strips (Wallac) have been used to demonstrate this principle. In addition, the 
assay also has utility for measuring ligand binding to receptors using radioactively labeled 
ligands. In a similar manner, when the radiolabeled bound ligand is centrifuged to the bottom 
of the well, the scintistrip label comes into proximity with the radiolabeled ligand resulting 



in activation and detection. 

2. Adenylyl Cyclase 

A Flash Plate™ Adenylyl Cyclase kit (New England Nuclear; Cat. No. SMP004A) 
designed for cell-based assays can be modified for use with crude plasma membranes. The 
15 Flash Plate wells contain a scintillant coating which also contains a specific antibody 
recognizing cAMP. The cAMP generated in the wells was quantitated by a direct 
competition for binding of radioactive cAMP tracer to the cAMP antibody. The following 
serves as a brief protocol for the measurement of changes in cAMP levels in membranes that 
express the receptors. 

20 Transfected cells are harvested approximately three days after transfectiqn. 

Membranes were prepared by homogenization of suspended cells in buffer containing 20mM 
HEPES, pH 7.4 and lOmM MgCl 2 . Homogenization is performed on ice.using a Brinkman 
Polytron™ for approximately 1 0 seconds. The resulting homogenate is centrifuged at 49,000 



\ 
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X g for 15 minutes at 4°C. The resulting pellet is then resuspended in buffer containing 
20mM HEPES, pH 7.4 and 0.1 mM EDTA, homogenized for 10 seconds, followed by 
centrifugation at 49,000 X g for 1 5 minutes at 4°C. The resulting pellet can be stored at - 
80°C until utilized. On the day of measurement, the membrane pellet is slowly thawed at 
room temperature, resuspended in buffer containing 20mM HEPES, pH 7.4 and lOmM 
MgCL 2 (these amounts can be optimized, although the values listed herein are preferred), to 
yield a final protein concentration of 0.60mg/ml (the resuspended membranes were placed 
on ice until use). 

cAMP standards and Detection Buffer (comprising 2 /iCi of tracer [ 125 I cAMP (100 
^1] to 11 ml Detection Buffer) are prepared and maintained in accordance with the 
manufacturer's instructions. Assay Buffer is prepared fresh for screening and contained 
20mM HEPES, pH 7.4, lOmM MgCl 2 , 20mM (Sigma), 0.1 units/ml creatine phosphokinase 
(Sigma), 50 fuM GTP (Sigma), and 0.2 mM ATP (Sigma); Assay Buffer can be stored on ice 
until utilized. The assay is initiated by addition of 50ul of assay buffer followed by addition 
of 50ul of membrane suspension to the NEN Flash Plate. The resultant assay mixture is 
incubated for 60 minutes at room temperature followed by addition of lOOul of detection 
buffer. Plates are then incubated an additional 2-4 hours followed by counting in a Wallac 
MicroBeta™ scintillation counter. Values of cAMP/weil are extrapolated from a standard 
cAMP curve that is contained within each assay plate. 

C. Reporter-Based Assays 

1. CREB Reporter Assay (Gs-associated receptors) 

A method to detect Gs stimulation depends on the known property of the transcription 
factor CREB, which is activated in a cAMP-dependent manner. A PathDetect™ CREB trans- 
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Reporting System (Stratagene, Catalogue # 219010) can utilized to assay for Gs coupled 
activity in 293 or 293T cells. Cells are transfected with the plasmids components of this 
above system and the indicated expression plasmid encoding endogenous or mutant receptor 
using a Mammalian Transfection Kit (Stratagene, Catalogue #200285) according to the 
manufacturer' s instructions. Briefly, 400 ng pFR-Luc (luciferase reporter plasmid containing 
Gal4 recognition sequences), 40 ng pFA2-CREB (Gal4-CREB fusion protein containing the 
Gal4 DNA-binding domain), 80 ng pCMV-receptor expression plasmid (comprising the 
receptor) and 20 ng CMV-SEAP (secreted alkaline phosphatase expression plasmid; alkaline 
phosphatase activity is measured in the media of transfected cells to control for variations in 
transfection efficiency between samples) are combined in a calcium phosphate precipitate as 
per the Kit's instructions. Half of the precipitate is equally distributed over 3 wells in a 96- 
well plate, kept on the cells overnight, and replaced with fresh medium the fol lowing morning. 
Forty-eight (48) hr after the start of the transfection, cells are treated and assayed for, e.g., 

luciferase activity 

2. API reporter assay (Gq-associated receptors) 

A method to detect Gq stimulation depends on the known property of Gq-dependent 
phospholipase C to cause the activation of genes containing API elements in their promoter. 
A Pathdetect™ AP-1 cis-Reporting System (Stratagene, Catalogue # 21 9073) can be utilized 
following the protocol set forth above with respect to the CREB reporter assay, except that 
the components of the calcium phosphate precipitate were 410 ng pAPl-Luc. 80 ng pCMV- 
receptor expression plasmid, and 20 ng CMV-SEAP. 

3. Cre-Luc Reporter Assay 

293 and 293T cells are plated-out on 96 well plates at a density of 2 x 10 4 cells per 
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well and were transfected using Lipofectamine Reagent (BRL) the following day according 
to manufacturer instructions. A DNA/lipid mixture is prepared for each 6-well transfection 
as follows: 260ng of plasmid DNA in 100^1 of DMEM were gently mixed with 2|il of lipid 
in 100|al of DMEM (the 260ng of plasmid DNA consisted of 200ng of a 8xCRE-Luc reporter 

5 plasmid (see below and Figure 1 for a representation of a portion of the plasmid), 50ng of 
pCMV comprising endogenous receptor or non-endogenous receptor or pCMV alone, and 
lOng of a GPRS expression plasmid (GPRS in pcDNA3 (Invitrogen)). The 8XCRE-Luc 
reporter plasmid was prepared as follows: vector SRIF-p-gal was obtained by cloning the rat 
somatostatin promoter (-71/+51) at BglV-Hindlll site in the ppgal-Basic Vector (Clontech). 

10 Eight (8) copies of cAMP response element were obtained by PCR from an adenovirus 
template AdpCF 1 26CCRE8 (see, 7 Human Gene Therapy 1883 (1 996)) and cloned into the 
SRIF-P-gal vector at the Kpn-BglV site, resulting in the 8xCRE-p-gal reporter vector. The 
8xCRE-Luc reporter plasmid was generated by replacing the beta-galactosidase gene in the 
8xCRE-p-gal reporter vector with the luciferase gene obtained from the pGL3 -basic vector 

1 5 (Promega) at the Hindlll-BamHI site. Following 30 min. incubation at room temperature, the 
DNA/lipid mixture was diluted with 400 ^il of DMEM and lOOfil of the diluted mixture was 
added to each well. 100 |il of DMEM with 10% FCS were added to each well after a 4hr 
incubation in a cell culture incubator. The following day the transfected cells were changed 
with 200 |il/well of DMEM with 10% FCS. Eight (8) hours later, the wells were changed to 

20 1 00 |il /well of DMEM without phenol red, after one wash with PBS. Luciferase activity were 
measured the next day using the LucLite™ reporter gene assay kit (Packard) following 
manufacturer instructions and read on a 1450 MicroBeta™ scintillation and luminescence 
counter (Wallac). 
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4. Srf-Luc Reporter Assay 

One method to detect Gq stimulation depends on the known property of Gq-dependent 
phospholipase C to cause the activation of genes containing serum response factors in their 
promoter. A Pathdetect™ SRF-Luc-Reporting System (Stratagene) can be utilized to assay 
5 for Gq coupled activity in, e.g., COS7 cells. Cells are transfected with the plasmid 
components of the system and the indicated expression plasmid encoding endogenous or non- 
endogenous GPCR using a Mammalian Transfection™ Kit (Stratagene, Catalogue #200285) 
according to the manufacturer's instructions. Briefly, 41 0 ng SRF-Luc, 80 ng pCMV-receptor 
expression plasmid and 20 ng CMV-SEAP (secreted alkaline phosphatase expression plasmid; 
10 alkaline phosphatase activity is measured in the media of transfected cells to control for 
variations in transfection efficiency between samples) are combined in a calcium phosphate 
precipitate as per the manufacturer's instructions. Half of the precipitate is equally distributed 
over 3 wells in a 96-well plate, kept on the cells in a serum free media for 24 hours. The last 
5 hours the cells are incubated with 1 uM Angiotensin, where indicated. Cells are then lysed 
1 5 and assayed for luciferase activity using a Luclite™ Kit (Packard, Cat. # 60 1 69 1 1 ) and "Trilux 
1450 Microbeta" liquid scintillation and luminescence counter (Wallac) as per the 
manufacturer's instructions. The data can be analyzed using GraphPad Prism™ 2.0a 

(GraphPad Software Inc.). 

5. Intracellular IP 3 Accumulation Assay 

On day 1 , cells comprising the receptors (endogenous and/or non-endogenous) can 
be plated onto 24 well plates, usually lxl 0 5 cells/well (although his umber can be 
optimized. On day 2 cells can be transfected by firstly mixing 0.25ug DNA in 50 ul serum 
free DMEM/well and 2 ul lipofectamine in 50 ^1 serumfree DMEM/well. The solutions 



20 
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are gently mixed and incubated for 15-30 min at room temperature. Cells are washed with 
0.5 ml PBS and 400 fj.1 of serum free media is mixed with the transfection media and 
added to the cells. The cells are then incubated for 3-4 hrs at 37°C/5%CO, and then the 
transfection media is removed and replaced with 1 ml/well of regular growth media. On 

5 day 3 the cells are labeled with 3 H-myo-inositol. Briefly, the media is removed and the 
cells are washed with 0.5 ml PBS. Then 0.5 ml inositol-free/serum free media (GIBCO 
BRL) is added/well with 0.25 ^Ci of 3 H-myo-inositol / well and the cells are incubated for 
16-18 hrs o/n at 37°C/5%C0 2 .On Day 4 the cells are washed with 0.5 ml PBS and 0.45 
ml of assay medium is added containing inositol-free/serum free media 10 fxM pargyline 

10 1 0 mM lithium chloride or 0.4 ml of assay medium and 50 ul of 1 Ox ketanserin (ket) to 
final concentration of 1 0fj.M. The cells are then incubated for 30 min at 37 ° C. The cells 
are then washed with 0.5 ml PBSand 200 ul of fresh/icecold stop solution (1M KOH; 18 
mM Na-borate; 3.8 mM EDTA) is added/well. The solution is kept on ice for 5-10 min or 
until cells were lysed and then neutralized by 200 tA of fresh/ice cold neutralization sol. 

15 (7.5 % HCL). The lysate is then transferred into 1 .5 ml eppendorf tubes and 1 ml of 
chloroform/methanol (1:2) is added/tube. The solution is vortexed for 1 5 sec and the 
upper phase is applied to a Biorad AG1-X8™ anion exchange resin (100-200 mesh). 
Firstly, the resin is washed with water at 1 : 1 .25 W/V and 0.9 ml of upper phase is loaded 
onto the column. The column is washed with 10 mis of 5 mM myo-inositol and 10 ml of 5 

20 mM Na-borate/60mM Na-formate. The inositol tris phosphates are eluted into scintillation 
vials containing 1 0 ml of scintillation cocktail with 2 ml of 0.1 M formic acid/ 1 M 
ammonium formate. The columns are regenerated by washing with 10 ml of 0.1 M formic 
acid/3M ammonium formate and rinsed twice with dd H 2 0 and stored at 4°C in water. 



WO 00/22131 

-50- 

Exemplary results are presented below in Table I: 
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TABLE I 



Receptor 



hATl 



5 hTDAG8 



hH9 
hCCKB 



Mutation 

F239K 


Assay 
Utilized 

SRF-LUC 


Signal 
Generated: 
Endogenous 
Version 
(Relative 
Light Units) 

34 


Signal 
Generated: 

Non- 
Endogenous 
Version 
(Relative 
Light Units) 
137 


Percent 
Difference 

75% i 


AT2K255IC3 


SRF-LUC 


34 


127 


73%! 


I225K 


CRE-LUC 
(293 cells) 


2,715 


14,440 


81%1 


I225K 


CRE-LUC 
(293T cells) 


65,681 


185,636 


65% t 


F236K 
V332K 


CRE-LUC 
CRE-LUC 


1,887 
785 


6,096 
3,223 


69% 1 
76% t 



were 



C. Cell-Based Detection Assay (Example -Tdag8) 
293 cells were plated-out on 1 50mm plates at a density of 1 .3 x 1 0 7 cells per plate, and 
transfected using 12ug of the respective DNA and 60ul of Lipofectamine Reagent 
(BRL) per plate. The transfected cells were grown in media containing serum for an assay 
performed 24 hours post-transfection. For detection assay performed 48 hours post- 
transfection (assay comparing serum and serum-free media; see Figure 3), the initial media 
1 5 was changed to either serum or serum-free media. The serum-free media was comprised solely 
of Dulbecco's Modified Eagle's (DME) High Glucose Medium (Irvine Scientific #9024). In 
addition to the above DME Medium, the media with serum contained the following: 1 0% 
Fetal Bovine Serum (Hyclone #SH30071.03), 1% of lOOmM Sodium Pyruvate (Irvine 
Scientific #9334), 1 % of 20mM L-Glutamine Orvine Scientific #93 1 7), and 1 % of Penicillin- 
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Streptomycin solution (Irvine Scientific #9366). 

A 96-well Adenylyl Cyclase Activation Flashplate™ was used (NEN: #SMP004A). 
First, 50ul of the standards for the assay were added to the plate, in duplicate, ranging from 
concentrations of 50pmol to zero pmol cAMP per well. The standard cAMP (NEN: 
#SMP004A) was reconstituted in water, and serial dilutions were made using lxPBS (Irvine 
Scientific: #9240). Next, 50ul of the stimulation buffer (NEN: #SMP004A) was added to all 
wells. In the case of using compounds to measure activation or inactivation of cAMP, lOul 
of each compound, diluted in water, was added to its respective well, in triplicate. Various 
final concentrations used range from luM up to ImM. Adenosine 5 '-triphosphate, ATP, 
(Research Biochemicals International: #A-14 1) and Adenosine 5 '-diphosphate, ADP, (Sigma: 
#A2754) were used in the assay. Next, the 293 cells transfected with the respective cDNA 
(CMV or TDAG8) were harvested 24 (assay detection in serum media) or 48 hours post- 
transfection (assay detection comparing serum and serum-free media). The media was 
aspirated and the cells washed once with lxPBS. Then 5ml of lxPBS was added to the cells 
along with 3ml of cell dissociation buffer (Sigma: #C-1544). The detached cells were 
transferred to a centrifuge tube and centrifuged at room temperature for five minutes. The 
supernatant was removed and the cell pellet was resuspended in an appropriate amount of 
lxPBS to obtain a final concentration of 2x1 Q 6 cells per milliliter. To the wells containing the 
compound, 50ul of the cells in lxPBS (lxl 0 s cells/well) were added. The plate was incubated 
on a shaker for 1 5 minutes at room temperature. The detection buffer containing the tracer 
cAMP was prepared. In 1 1ml of detection buffer (NEN: #SMP004A), 50ul (equal to luCi) 
of [ ,25 I]cAMP (NEN: #SMP004A) was added. Following incubation, 50ul of this detection 
buffer containing tracer cAMP was added to each well. The plate was placed on a shaker and 
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incubated at room temperature for two hours. Finally, the solution from the wells of the plate 
aspirated and the flashplate was counted using the Wallac MicroBeta™ scintillation 



were 



counter. 

In Figure 2A, ATP and ADP bind to endogenous TDAG8 resulting in an increase 
of cAMP of about 59% and about 55% respectively. Figure 2B evidences ATP and ADP 
binding to endogenous TDAG8 where endogenous TDAG8 was transfected and grown in 
serum and serum-free medium. ATP binding to endogenous TDAG8 grown in serum 
media evidences an increase in cAMP of about 65%, compared to the endogenous TDAG8 
with no compounds; in serum-free media there was an increase of about 68%. ADP 
binding to endogenous TDAG8 in serum evidences about a 61% increase, while in serum- 
free ADP binding evidences an increase of about 62% increase. ATP and ADP bind to 
endogenous TDAG8 with an EC50 value of 139.8uM and 120.5uM, respectively (data not 

shown). 

Although the results presented in Figure 2B indicate substantially the same results 
when serum and serum-free media were compared, our choice is to use a serum based 
media, although a serum-free media can also be utilized. 
Example 6 

GPCR Fusion Protein Preparation 

The design of the constitutively activated GPCR-G protein fusion construct was 
accomplished as follows: both the 5' and 3' ends of the rat G protein Gsct (long form; Itoh, 
H. et al., 83 PNAS 3776 (1986)) were engineered to include a Hindlll (5'-AAGCTT-3') 
sequence thereon. Following confirmation of the correct sequence (including the flanking 
Hindlll sequences), the entire sequence was shuttled into pcDNA3.1(-) (Invitrogen, cat. no. 
V795-20) by subcloning using the Hindlll restriction site of that vector. The correct 
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orientation for the Gsa sequence was determined after subcloning into pcDNA3.1(-). The 
modified pcDN A3 . 1 (-) containing the rat Gsa gene at Hindlll sequence was then verified; this 
vector was now available as a "universal" Gsa protein vector. The pcDNA3.1(-) vector 
contains a variety of well-known restriction sites upstream of the Hindlll site, thus 

5 beneficially providing the ability to insert, upstream of the Gs protein, the coding sequence 
of an endogenous, constitutively active GPCR. This same approach can be utilized to create 
other "universal" G protein vectors, and, of course, other commercially available or 
proprietary vectors known to the artisan can be utilized - the important criteria is that the 
sequence for the GPCR be upstream and in-frame with that of the G protein. 

1 o TDAG8 couples via Gs, while H9 couples via Gz. For the following exemplary GPCR 

Fusion Proteins, fusion to Gsa was accomplished. 

A TDAG8(I225K)-Gsa Fusion Protein construct was made as follows: primers were 

designed as follows: 

5 ' -gatcTCTAG A ATG A ACAGC AC ATGTATTG AAG-3 * (SEQ.ID.NO.: 125; sense) 
1 5 5*-ctagGGTACCCGCTCAAGGACCTCTAATTCCATAG-3 * (SEQ.ID.NO.: 1 26; antisense). 

Nucleotides in lower caps are included as spacers in the restriction sites between the 
G protein and TDAG8. The sense and anti-sense primers included the restriction sites for 

Xbal and Kpnl, respectively. 

PCR was then utilized to secure the respective receptor sequences for fusion within 



20 



the Gsa universal vector disclosed above, using the following protocol for each: 1 OOng cDNA 
for TDAG8 was added to separate tubes containing 2ul of each primer (sense and anti-sense), 
3uL of lOmM dNTPs, lOuL of 1 OXTaqPlus™ Precision buffer, luL of TaqPlus™ Precision 
polymerase (Stratagene: #6002 1 1 ), and 80uL of water. Reaction temperatures and cycle times 
for TDAG8 were as follows: the initial denaturing step was done it 94 °C for five minutes, and 
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a cycle of 94°C for 30 seconds; 55 °C for 30 seconds; 72°C for two minutes. A final 
extension time was done at 72 °C for ten minutes. PCR product for was run on a 1 % agarose 
gel and then purified (data not shown). The purified product was digested with Xbal and 
Kpnl (New England Biolabs) and the desired inserts purified and ligated into the Gs universal 
5 vector at the respective restriction site. The positive clones was isolated following 
transformation and determined by restriction enzyme digest; expression using 293 cells was 
accomplished following the protocol set forth infra. Each positive clone for TDAG8:Gs - 
Fusion Protein was sequenced to verify correctness. 

GPCR Fusion Proteins comprising non-endogenous, constitutively activated 
10 TDAG8(I225K) were analyzed as above and verified for constitutive activation. 

An H9(F236K)-Gsa Fusion Protein construct was made as follows: primers were 
designed as follows: 

5 ' -TTAgatatcGGGGCCCACCCTAGCGGT-3 * (SEQ.ID.NO.: 145; sense) 

5 ' -ggtaccCCC AC AGCCATTTCATCAGG ATC-3 ' (SEQ.ID.NO.: 146; antisense). 

Nucleotides in lower caps are included as spacers in the restriction sites between the 
G protein and H9. The sense and anti-sense primers included the restriction sites for EcoRV 
and Kpnl, respectively such that spacers (attributed to the restriction sites) exists between the 
G protein and H9. 

PCR was then utilized to secure the respective receptor sequences for fusion within 
the Gsa universal vector disclosed above, using the following protocol for each: 80ng cDNA 
for H9 was added to separate tubes containing lOOng of each primer (sense and anti-sense), 
and 45uL of PCR Supermix™ (Gibco-Brl, LifeTech) (50ul total reaction volume). Reaction 
temperatures and cycle times for H9 were as follows: the initial denaturing step was done it 
94°C for one, and a cycle of 94°C for 30 seconds: 55 °C for 30 seconds; 72°C for two 
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minutes. A final extension time was done at 72° C for seven minutes. PCR product for was 
run on a 1 % agarose gel and then purified (data not shown). The purified product was cloned 
into pCRII-TOPO™ System followed by identification of positive clones. Positive clones 
were isolated, digested with EcoRV and Kpnl (New England Biolabs) and the desired inserts 
were isolated, purified and ligated into the Gs universal vector at the respective restriction site. 
The positive clones was isolated following transformation and determined by restriction 
enzyme digest; expression using 293 cells was accomplished following the protocol set forth 
infra. Each positive clone for H9(F236K):Gs - Fusion Protein was sequenced to verify 
correctness. Membranes were frozen (-80°C) until utilized. 

To ascertain the ability of measuring a cAMP response mediated by the Gs protein 
(even though H9 couples with Gz), the following cAMP membrane assay was utilized, based 
upon an NEN Adenyl Cyclase Activation Flahplate™ Assay kit (96 well format). "Binding 
Buffer" consisted of lOmMHEPES, lOOmMNaCland 10mMMgCl(ph7.4). "Regeneration 
Buffer" was prepared in Binding Buffer and consisted of 20mM phosphocreatine, 20U 
creatine phosphokinase, 20uM GTP, 0.2mM ATP, and 0.6mM IBMX. "cAMP Standards" 



were prepared 


in Binding Buffer 


as follows: 




cAMP Stock 




Added to 


Final Assay Concentration 


(5,000 pmol/ml in 


2ml H 2 0) 


indicted amount of Binding 


(50ul into lOOul) 


. in ul 




Buffer 


to achieve indicated pmol/well 


A 


250 




1ml 


50 


B 


500 of A 




500ul 


25 


C 


500 of B 




500ul 


12.5 


D 


500 of C 




750ul 


5.0 


E 


500 ofD 




500ul- 


2.5 


F 


500 ofE 




500u1 


1.25 


G 


500 ofF 




750ul 


0.5 



Frozen membranes (both pCMV as control and the non-endogenous H(-Gs Fusion 
Protein) were thawed (on ice at room temperature until in solution). Membranes were 
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homogenized with a polytron until in suspension (2x15 seconds). Membrane protein 
concentration was determined using the Bradford Assay Protocol (see infra). Membrane 
concentration was diluted to 0.5mg/ml in Regeneration Buffer (final assay concentration - 
25ug/well). Thereafter, 50ul of Binding Buffer was added to each well. For control, 50ul/well 
5 of cAMP standard was added to wells 1 1 and 12 A-G, with Binding Buffer alone to 12H (on 
the 96-well format). Thereafter, 50ul/well of protein was added to the wells and incubated at 
room temperature (on shaker) for 60min. 1 OOul I]cAMP in Detection Buffer (see infra) was 
added to each well (final - 50ul[' 25 I]cAMP into 11ml Detection Buffer). These were 
incubated for 2hrs at room temperature. Plates were aspirated with an 8 channel manifold and 
10 sealed with plate covers. Results (pmoles cAMP bound) were read in a Wallac™ 1450 on 
"prot #15). Results are presented in Figure 3. 

The results presented in Figure 3 indicate that the Gs coupled fusion was able to 
"drive" the cyclase reaction such that measurement of the consitutive activation of H9(F236K) 
was viable. Based upon these results, the direct identification of candidate compounds that 

gonists, agonists and partial agonists is possible using a cyclase-based assay. 



15 are inverse a 



Protocol? Direct Identification of Inverse Agonists and Agonists Using [ 33 S]GTPyS 

Although we have utilized endogenous, constitutively active GPCRs for the direct 
identification of candidate compounds as, e.g., inverse agonists, for reasons that are not 
20 altogether understood, intra-assay variation can become exacerbated. Preferably, then, a 
GPCR Fusion Protein, as disclosed above, is also utilized with a non-endogenous, 
constitutively activated GPCR. We have determined that when such a protein is used, intra- 
assay variation appears to be substantially stabilized, whereby an effective signal-to-noise 
ratio is obtained. This has the beneficial result of allowing for a more robust identification 



WO 00/22131 PCTAJS99/24065 

-57- 

of candidate compounds. Thus, it is prefeired that for direct identification, a GPCR Fusion 
Protein be used and that when utilized, the following assay protocols be utilized. 
Membrane Preparation 

Membranes comprising the non-endogenous, constitutively active orphan GPCR 
Fusion Protein of interest and for use in the direct identification of candidate compounds as 
inverse agonists, agonists or partial agonists are preferably prepared as follows: 

a. Materials 

"Membrane Scrape Buffer" is comprised of 20mM HEPES and 1 OmM EDTA, pH 7.4; 
"Membrane Wash Buffer" is comprised of 20 mM HEPES and 0.1 mM EDTA, pH 7.4; 
"Binding Buffer" is comprised of 20mM HEPES, 100 mM NaCl, and 10 mM MgCl 2 , pH 7.4 

b. Procedure 

All materials are kept on ice throughout the procedure. Firstly, the media is aspirated 
from a confluent monolayer of cells, followed by rinse with 10ml cold PBS, followed by 
aspiration. Thereafter, 5ml of Membrane Scrape Buffer is added to scrape cells; this is 
followed by transfer of cellular extract into 50ml centrifuge tubes (centrifuged at 20,000 rpm 
for 17 minutes at 4°C). Thereafter, the supernatant is aspirated and the pellet is resuspended 
30ml Membrane Wash Buffer followed by centrifuge at 20,000 rpm for 1 7 minutes at 4°C. 
The supernatant is then aspirated and the pellet resuspended in Binding Buffer. This is then 
homogenized using a Brinkman polytron™ homogenizer (15-20 second bursts until the all 
material is in suspension). This is referred to herein as "Membrane Protein". 
Bradford Protein Assay 

Following the homogenization, protein concentration of the membranes is determined 

t 

using the Bradford Protein Assay (protein can be diluted to about 1 .5mg/ml, aliquoted and 
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frozen (-80°C) for later use; when frozen, protocol for use is as follows: on the day of the 
assay, frozen Membrane Protein is thawed at room temperature, followed by vortex and then 
homogenized with a polytron at about 12x1 ,000 rpm for about 5-10 seconds: it is noted that 
for multiple preparations, the homogenizor should be thoroughly cleaned between 
homoginezation of different preparations). 

a. Materials 

Binding Buffer (as per above); Bradford Dye Reagent; Bradford Protein Standard are 
utilized, following manufacturer instructions (Biorad, cat. no. 500-0006). 

b. Procedure 

Duplicate tubes are prepared, one including the membrane, and one as a control 
"blank". Each contained 800ul Binding Buffer. Thereafter, lOul of Bradford Protein Standard 
(lmg/ml) is added to each tube, and 1 Oul of membrane Protein is then added to just one tube 
(not the blank). Thereafter, 200ul of Bradford Dye Reagent is added to each tube, followed 
by vortex of each. After five (5) minutes, the tubes were re-vortexed and the material therein 
is transferred to cuvettes. The cuvettes are then read using a CECIL 3041 spectrophotometer, 

at wavelength 595. 

Direct Identification Assay 
a. Materials 

GDP Buffer consists of 37.5 ml Binding Buffer and 2mg GDP (Sigma, cat. no. G- 
7127), followed by a series of dilutions in Binding Buffer to obtain 0.2 uM GDP (final 
concentration of GDP in each well was 0.1 uM GDP); each well comprising a candidate 
compound, has a final volume of 200ul consisting of 1 OOul GDP Buffer (final concentration, 
0.1 uM GDP), 50ul Membrane Protein in Binding Buffer, and 50ul [ 35 S]GTPyS (0.6 nM) in 
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Binding Buffer (2.5 ul [ 35 S]GTPyS per 10ml Binding Buffer), 
b. Procedure 

Candidate compounds are preferably screened using a 96-well plate format (these can 
be frozen at -80 °C). Membrane Protein (or membranes with expression vector excluding the 
GPCR Fusion Protein, as control), are homogenized briefly until in suspension. Protein 
concentration is then determined using the Bradford Protein Assay set forth above. Membrane 
Protein (and control) is then diluted to 0.25mg/ml in Binding Buffer (final assay 
concentration, 12.5ug/well). Thereafter, 1 00 ul GDP Buffer is added to each well of a Wallac 
Scintistrip™ (Wallac). A 5ul pin-tool is then used to transfer 5 ul of a candidate compound 
into such well 5ul in total assay volume of 200 ul is a 1:40 ratio such that the final 
screening concentration of the candidate compound is 1 OuM). Again, to avoid contamination, 
after each transfer step the pin tool should be rinsed in three reservoirs comprising water (IX), 
ethanol (IX) and water (2X) - excess liquid should be shaken from the tool after each rinse 
and dried with paper and kimwipes. Thereafter, 50 ul of Membrane Protein is added to each 
well (a control well comprising membranes without the GPCR Fusion Protein is also utilized), 
and pre-incubated for 5-1 0 minutes at room temperature. Thereafter, 50 ul of [ 35 S]GTPyS (0.6 
nM) in Binding Buffer is added to each well, followed by incubation on a shaker for 60 
minutes at room temperature (again, in this example, plates were covered with foil). The 
assay is then stopped by spinning of the plates at 4000 RPM for 15 minutes at 22 °C. The 
plates are then aspirated with an 8 channel manifold and sealed with plate covers. The plates 
are then read on a Wallace 1450 using setting "Prot. #37" (as per manufacturer instructions). 
Example 7 

Protocol: Confirmation Assay 

Using an independent assay approach to provide confirmation of a directly identified 
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candidate compound as set forth above, it is preferred that a confirmation assay then be 
utilized. In this case, the preferred confirmation assay is a cyclase-based assay. 

A modified Flash Plate™ Adenylyl Cyclase kit (New England Nuclear; Cat. No. 
SMP004A) is preferably utilized for confirmation of candidate compounds directly identified 
as inverse agonists and agonists to non-endogenous, constitutively activated orphan GPCRs 
in accordance with the following protocol. 

Transfected cells are harvested approximately three days after transfection. 
Membranes are prepared by homogenization of suspended cells in buffer containing 20mM 
HEPES, pH 7.4 and lOmM MgCl 2 . Homogenization is performed on ice using a Brinkman 
Polytron™ for approximately 1 0 seconds. The resulting homogenate is centrifuged at 49,000 
X g for 15 minutes at 4°C. The resulting pellet is then resuspended in buffer containing 
20mM HEPES, pH 7.4 and 0.1 mM EDTA, homogenized for 10 seconds, followed by 
centrifugation at 49,000 X g for 1 5 minutes at 4°C. The resulting pellet can be stored at - 
80°C until utilized. On the day of direct identification screening, the membrane pellet is 
slowly thawed at room temperature, resuspended in buffer containing 20mM HEPES, pH 7.4 
and lOmM MgCL2, to yield a final protein concentration of 0.60mg/ml (the resuspended 
membranes are placed on ice until use). 

cAMP standards and Detection Buffer (comprising 2 fuCi of tracer [ ,25 I cAMP (100 
A*l] to 11 ml Detection Buffer) are prepared and maintained in accordance with the 
manufacturer's instructions. Assay Buffer is prepared fresh for screening and contained 
20mM HEPES, pH 7.4, 1 OmM MgCl 2 , 20mM phospocreatine (Sigma), 0. 1 units/ml creatine 
phosphokinase (Sigma), 50 uM GTP (Sigma), and 0.2 mM ATP (Sigma); Assay Buffer can 
be stored on ice until utilized. 
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Candidate compounds identified as per above (if frozen, thawed at room temperature) 
are added, preferably, to 96-well plate wells O^l/well; 12^M final assay concentration), 
together with 40 /*1 Membrane Protein (30//g/well) and 50/zl of Assay Buffer. This admixture 
is then incubated for 30 minutes at room temperature, with gentle shaking. 

Following the incubation, 100^1 of Detection Buffer is added to each well, followed 
by incubation for 2-24 hours. Plates are then counted in a Wailac MicroBeta™ plate reader 
using "Prot. #31 " (as per manufacturer instructions). 

It is intended that each of the patents, applications, and printed publications mentioned 
in this patent document be hereby incorporated by reference in their entirety. 

As those skilled in the art will appreciate, numerous changes and modifications may 
be made to the preferred embodiments of the invention without departing from the spirit of 
the invention. It is intended that all such variations fall within the scope of the invention. 

Although a variety of expression vectors are available to those in the art, for 
purposes of utilization for both the endogenous and non-endogenous human GPCRs, it is 
most preferred that the vector utilized be pCMV. This vector was deposited with the 
American Type Culture Collection (ATCC) on October 13, 1998 (10801 University Blvd., 
Manassas, VA 201 10-2209 USA) under the provisions of the Budapest Treaty for the 
International Recognition of the Deposit of Microorganisms for the Purpose of Patent 
Procedure. The DNA was tested by the ATCC and determined to be. The ATCC has 
assigned the following deposit number to pCMV: ATCC #20335 1 . 
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CLA1MS 

What is claimed is: 

1 A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hARE-3(F313K). 

2. A non-endogenous version of a human G protein-coupled receptor encoded by the 

cDN A of claim 1 . 

3 . A Plasmid comprising a Vector and the cDN A of claim 1 . 

4. A Host Cell comprising the Plasmid of claim 3 . 

5. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hARE-4(V233K) 

6. A non-endogenous version of a human G protein-coupled receptor encoded by the 

cDNA of claim 5. 

7. A Plasmid comprising a Vector and the cDNA of claim 5. 

8. A Host Cell comprising the Plasmid of claim 7. 

9. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hARE-5(A240K). 

10. A non-endogenous version of a human G protein-coupled receptor encoded by the 

cDNA of claim 9. 

1 1 . A Plasmid comprising a Vector and the cDN A of claim 5 . 

12. A Host Cell comprising the Plasmid of claim 1 1 . 

13. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hGPCR14(L257K). 
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14. A non-endogenous version of a human G protein-coupled receptor encoded by the 
cDN A of claim 1 3 . 

15. A Plasmid comprising a Vector and the cDNA of claim 13. 

16. A "Host Cell comprising the Plasmid of claim 15. 

17. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hGPCR27(C283K). 

1 8. A non-endogenous version of a human G protein-coupled receptor encoded by the 
cDNA of claim 17. 

19. A Plasmid comprising a Vector and the cDNA of claim 17. 

20. A Host Cell comprising the Plasmid of claim 19. 

2 1 . A cDN A encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hARE-l(E232K). 

22. A non-endogenous version of a human G protein-coupled receptor encoded by the 
cDN A of claim 2 1 . 

23. A Plasmid comprising a Vector and the cDNA of claim 21. 

24. A Host Cell comprising the Plasmid of claim 23. 

25. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hARE-2(G285K). 

26. A non-endogenous version of a human G protein-coupled receptor encoded by the 
cDNA of claim 25. 

27. A Plasmid comprising a Vector and the cDNA of claim 25. 

28. A Host Cell comprising the Plasmid of claim 27. 
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29. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hPPRl(L239K). 

30. A non-endogenous version of a human G protein-coupled receptor encoded by the 
cDNA of claim 29. 

5 3 1 . A Plasmid comprising a Vector and the cDNA of claim 29. 

32. A Host Cell comprising the Plasmid of claim 31. 

33. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hG2A(K232A). 

34. A non-endogenous version of a human G protein-coupled receptor encoded by the 
10 cDNA of claim 33. 

35. A Plasmid comprising a Vector and the cDNA of claim 33. 

36. A Host Cell comprising the Plasmid of claim 35. 

37. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hRUP3(L224K). 

15 38. A non-endogenous version of a human G protein-coupled receptor encoded by the 

cDNA of claim 37. 

39. A Plasmid comprising a Vector and the cDNA of claim 37. 

40. A Host Cell comprising the Plasmid of claim 39. 

41. A cDNA encoding a non-endogenous, constitutively activated version of a human 
20 G protein-coupled receptor comprising hRUP5(A236K). 

42. A non-endogenous version of a human G protein-coupled receptor encoded by the 
cDN A of claim 4 1 . 

43. A Plasmid comprising a Vector and the cDNA of claim 41 . 
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44. A Host Cell comprising the Plasmid of claim 42. 

45. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hRUP6(N267K) 

46. A non-endogenous version of a human G protein-coupled receptor encoded by the 
cDNAofclaim45. 

47. A Plasmid comprising a Vector and the cDNA of claim 45. 

48. A Host Cell comprising the Plasmid of claim 47. 

49. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hRUP7(A302K). 

50. A non-endogenous version of a human G protein-coupled receptor encoded by the 

cDNA of claim 49. 

5 1 . A Plasmid comprising a Vector and the cDNA of claim 49. 

52. A Host Cell comprising the Plasmid of claim 51. 

w 

53. A cDNA encoding a non-endogenous, constitutively activated version of a human 
15 G protein-coupled receptor comprising hCHN4(V236K). 

54. A non-endogenous version of a human G protein-coupled receptor encoded by the 

cDNA of claim 53. 

55. A Plasmid comprising a Vector and the cDNA of claim 53. 

56. A Host Cell comprising the Plasmid of claim 55. 

57. A cDNA encoding a non-endogenous, constitutively activated version of a human 



10 



20 



G protein-coupled receptor comprising hMC4(A244K). 
58. A non-endogenous version of a human G protein-coupled receptor encoded by the 



cDNA of claim 57. 
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59. A Plasmid comprising a Vector and the cDNA of claim 57. 

60. A Host Cell comprising the Plasmid of claim 60. 

61. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hCHN3(S284K). 

62. A non-endogenous version of a human G protein-coupled receptor encoded by the 

cDNAof claim 61. 
63 A Plasmid comprising a Vector and the cDN A of claim 6 1 . 

64. A Host Cell comprising the Plasmid of claim 63. 

65. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hCHN6(L352K). 

66. A non-endogenous version of a human G protein-coupled receptor encoded by the 

cDNA of claim 65. 

67. A Plasmid comprising a Vector and the cDNA of claim 65. 

68. A Host Cell comprising the Plasmid of claim 67. 

69. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hCHN8(N235K). 

70. A non-endogenous version of a human G protein-coupled receptor encoded by the 

cDNA of claim 69. 
71 A Plasmid comprising a Vector and the cDNA of claim 69. 

72. A Host Cell comprising the Plasmid of claim 71 . 

73. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled receptor comprising hH9(F236K). 

74. A non-endogenous version of a human G protein-coupled receptor encoded by the 
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cDNA of claim 73 . 

75. A Plasmid comprising a Vector and the cDNA of claim 73. 

76. A Host Cell comprising the Plasmid of claim 74. 

77. A cDNA encoding a non-endogenous, constitutively activated version of a human 
G protein-coupled ATI receptor selected from the group consisting of: 
hATl(F239K); hATl(Nl 1 1 A); hATl(AT2K255IC3); and hATl(A243+). 

78. A non-endogenous version of a human G protein-coupled receptor encoded by a 
cDNA of claim 77. 

79. A Plasmid comprising a Vector and the cDNA of claim 77. 

80. A Host Cell comprising the Plasmid of claim 79. 

*************************** 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Behan , Dominic P. 
5 Lehmann- Bruins ma , Karin 

Chalmers, Derek T. 
Lowitz, Kevin P. 
Lin, I-Lin 
Dang, Huong T. 
10 Chen, Ruoping 

Liaw, Chen W. 
Gore, Martin J. 
White, Carol 

(ii) TITLE OF INVENTION: Non - Endogenous , Cons t i tut ively Activated Human G 
15 Protein -Coupled Receptors 

(iii) NUMBER OF SEQUENCES: 146 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Arena Pharmaceuticals, Inc. 

20 (B) STREET: 6166 Nancy Ridge Drive 

(C) CITY: San Diego 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP: 92121 

25 (v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

30 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 
35 (A) NAME: Burgoon, Richard P. 

(B) REGISTRATION NUMBER: 34,787 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: (858)453-7200 
<B) TELEFAX: (8 58)453-7210 

40 (2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1260 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

ATGGTCTTCT CGGCAGTGTT GACTGCGTTC CATACCGGGA CATCCAACAC AACATTTGTC 60 

GTGTATGAAA ACACCTACAT GAATATTACA CTCCCTCCAC CATTCCAGCA TCCTGACCTC 120 

AGTCCATTGC TTAGATATAG TTTTGAAACC ATGGCTCCCA CTGGTTTGAG TTCCTTGACC 180 

GTGAATAGTA CAGCTGTGCC CACAACACCA GCAGCATTTA AGAG C C T AAA CTTGCCTCTT 2 40 

CAGATCACCC TTTCTG CTAT AATGATATTC ATTCTGTTTG TGTCTTTTCT TGGGAACTTG 3 00 

GTTGTTTGCC TCATGGTTTA CCAAAAAGCT GCCATGAGGT CTGCAATTAA CATCCTCCTT 360 

GCCAGC CTAG CTTTTG CAGA CATGTTGCTT GCAGTGCTGA ACATGCCCTT TGCCCTGGTA -420 

ACTATTCTTA CTACCCGATG GATTTTTGGG AAATTCTTCT GTAGGGTATC TGCTATGTTT 4 80 

TTCTGGTTAT TTGTGATAGA AGGAGTAGCC ATCCTGCTCA TCATTAGCAT AGATAGGTTC 540 

CTTATTATAG TCCAGAGGCA GGATAAGCTA AAC CCATATA GAGCTAAGGT TCTGATTGCA 6 00 

GTTTCTTGGG CAACTTCCTT TTGTG TAG CT TTTCCTTTAG CCGTAGGAAA CCCCGACCTG 660 

C AG AT AC CTT CCCGAGCTCC CCAGTGTGTG TTTGGGTACA CAACCAATCC AGGCTACCAG 720 

GCTTATGTGA TTTTGATTTC TCTCATTTCT TTCTTCATAC CCTTCCTGGT AATACTGTAC 7 80 

TCATTTATGG G C AT AC TCAA CACCCTTCGG CACAATGCCT TGAGGATCCA TAGCTACCCT 84 0 

GAAGGTATAT GCCTCAGCCA GGCCAGCAAA CTGGGTCTCA TGAGTCTGCA GAGACCTTTC 90 0 

CAGATGAGCA TTGACATGGG CTTTAAAACA CGTGCCTTCA CCACTATTTT GATTCTCTTT 960 

GCTGTCTTCA TTGTCTGCTG GGCCCCATTC ACCACTTACA GCCTTGTGGC AACATTCAGT 102 0 

AAGCACTTTT ACTATCAGCA CAACTTTTTT GAGATTAGCA CCTGGCTACT GTGGCTCTGC 1080 

TACCTCAAGT CTGCATTGAA TCCGCTGATC TACTACTGGA GGATTAAGAA ATTCCATGAT 114 0 

GCTTGCCTGG ACATGATGCC TAAGTC CTTC AAGTTTTTGC CGCAGCTCCC TGGTCACACA 1200 

AAGCGACGGA TACGTC CTAG TGCTGTCTAT GTGTGTGGGG AACATCGGAC GGTGGTGTGA 1260 
(3) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 419 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 
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(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Val Phe Ser Ala Val Leu Thr Ala Phe His Thr Gly Thr Ser Asn 
x 5 10 15 

5 Ttir Thr Phe Val Val Tyr Glu Asn Thr Tyr Met Asn lie Thr Leu Pro 

20 25 30 

Pro Pro Phe Gin His Pro Asp Leu Ser Pro Leu Leu Arg Tyr Ser Phe 
35 40 45 

Glu Thr Met Ala Pro Thr Gly Leu Ser Ser Leu Thr Val Asn Ser Thr 
10 50 55 60 

Ala Val Pro Thr Thr Pro Ala Ala Phe Lys Ser Leu Asn Leu Pro Leu 
65 70 75 80 

Gin lie Thr Leu Ser Ala He Met He Phe He Leu Phe Val Ser Phe 

85 90 95 

15 Leu Gly Asn Leu Val Val Cys Leu Met Val Tyr Gin Lys Ala Ala Met 

100 105 HO 

Arg Ser Ala He Asn He Leu Leu Ala Ser Leu Ala Phe Ala Asp Met 
115 120 125 

Leu Leu Ala Val Leu Asn Met Pro Phe Ala Leu Val Thr He Leu Thr 
20 130 135 140 

Thr Arg Trp He Phe Gly Lys Phe Phe Cys Arg Val Ser Ala Met Phe 
145 150 155 160 

Phe Trp Leu Phe Val He Glu Gly Val Ala He Leu Leu He He Ser 

165 170 175 

25 He Asp Arg Phe Leu He He Val Gin Arg Gin Asp Lys Leu Asn Pro 

180 185 190 

Tyr Arg Ala Lys Val Leu He Ala Val Ser Trp Ala Thr Ser Phe Cys 
195 200 205 

Val Ala Phe Pro Leu Ala Val Gly Asn Pro Asp Leu Gin He Pro Ser 
30 210 215 220 

Arg Ala Pro Gin Cys Val Phe Gly Tyr Thr Thr Asn Pro Gly Tyr Gin 
225 230 235 240 

Ala Tyr Val He Leu He Ser Leu He Ser Phe Phe He Pro Phe Leu 

245 250 255 

35 val He Leu Tyr Ser Phe Met Gly He Leu Asn Thr Leu Arg His Asn 

260 265 270 
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Ala Leu Arg He His Ser Tyr Pro Glu Gly He Cys Leu Ser Gin Ala 
275 280 285 

Ser Lys Leu Gly Leu Met Ser Leu Gin Arg Pro Phe Gin Met Ser He 
290 295 300 

Asp Met Gly Phe Lys Thr Arg Ala Phe Thr Thr He Leu He Leu Phe 
305 310 315 320 

Ala Val Phe He Val Cys Trp Ala Pro Phe Thr Thr Tyr Ser Leu Val 

325 330 335 

Ala Thr Phe Ser Lys His Phe Tyr Tyr Gin His Asn Phe Phe Glu He 

340 345 350 

Ser Thr Trp Leu Leu Trp Leu Cys Tyr Leu Lys Ser Ala Leu Asn Pro 
355 360 365 

Leu He Tyr Tyr Trp Arg He Lys Lys Phe His Asp Ala Cys Leu Asp 
370 375 380 

Met Met Pro Lys Ser Phe Lys Phe Leu Pro Gin Leu Pro Gly His Thr 
385 390 395 400 

Lys Arg Arg He Arg Pro Ser Ala Val Tyr Val Cys Gly Glu His Arg 

405 410 415 

Thr Val Val 



(4) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1119 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

ATGTTAGCCA ACAGCTCCTC AAC CAACAGT TCTGTTCTCC CGTGTCCTGA CTACCGACCT 6 0 

ACCCACCGCC TGCACTTGGT GGTCTACAGC TTGGTGCTGG CTGCCGGGCT CCCCCTCAAC 12 0 

GCGCTAGCCC TCTGGGTCTT CCTGCGCGCG CTGCGCGTGC ACTCGGTGGT GAGCGTGTAC 180 

ATGTGTAACC TGGCGGCCAG CGACCTGCTC TTCACCCTCT CGCTGCCCGT TCGTCTCTCC 24 0 

TACTACGCAC TGCACCACTG GCCCTTCCCC GACCTCCTGT GCCAGACGAC GGGCGCCATC 300 

TTCCAGATGA ACATGTACGG CAGCTGCATC TTCCTGATGC TCATCAACGT GGACCGCTAC 36 0 
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GCCGCCATCG TGCACCCGCT GCGACTGCGC CACCTGCGGC GGCCCCGCGT GGCGCGGCTG 420 

CTCTG CCTGG GCGTGTGGGC GCTCATCCTG GTGTTTGCCG TGCCCGCCGC CCGCGTGCAC 480 

AGGCCCTCGC GTTGCCGCTA CCGGGACCTC GAGGTGCGCC TATGCTTCGA GAGCTTCAGC 540 

GACGAGCTGT GGAAAGGCAG GCTGCTGCCC CTCGTGCTGC TGGCCGAGGC GCTGGGCTTC 6 00 

5 CTGCTGCCCC TGGCGGCGGT GGTCTACTCG TCGGGCCGAG TCTTCTGGAC GCTGGCGCGC 660 

CCCGACGCCA CGCAGAG CCA GCGGCGGCGG AAGACCGTGC GCCTCCTGCT GGCTAACCTC 720 

GTCATCTTCC TGCTGTGCTT CGTGCCCTAC AACAGCACGC TGGCGGTCTA CGGGCTGCTG 780 

CGGAGCAAGC TGGTGGCGGC CAGCGTGCCT GCCCGCGATC GCGTGCGCGG GGTG CTGATG 840 

GTGATGGTGC TGCTGGCCGG CGCCAACTGC GTGCTGGACC CGCTGGTGTA CTACTTTAGC 900 

10 GCCGAGGGCT TCCGCAACAC CCTGCGCGGC CTGGG CACTC CGCACCGGGC CAGGACCTCG 96 0 

GCCACCAACG GGACGCGGGC GGCGCTCGCG CAATCCGAAA GGTCCGCCGT CACCACCGAC 1020 

GCCACCAGGC CGGATGCCGC CAGT CAGGGG CTGCTCCGAC CCTCCGACTC CCACTCTCTG 1080 

TCTTCCTTCA CACAGTGTCC CCAGGATTCC GCCCTCTGA 1119 
( 5 ) INFORMATION FOR SEQ ID NO : 4 : 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 372 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

20 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Leu Ala Asn Ser Ser Ser Thr Asn Ser Ser Val Leu Pro Cys Pro 
15 10 15 

Asp Tyr Arg Pro Thr His Arg Leu His Leu Val Val Tyr Ser Leu Val 
25 20 25 30 

Leu Ala Ala Gly Leu Pro Leu Asn Ala Leu Ala Leu Trp Val Phe Leu 
35 40 45 

Arg Ala Leu Arg Val His Ser Val Val Ser Val Tyr Met Cys Asn Leu 
50 55 60 

30 Ala Ala Ser Asp Leu Leu Phe Thr Leu Ser Leu Pro Val Arg Leu Ser 

65 70 75 80 

Tyr Tyr Ala Leu His His Trp Pro Phe Pro Asp Leu Leu Cys Gin Thr 
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Ttir Gly 



Met Leu 

5 

Leu Arg 
130 

Val Trp 
145 

10 Axrg Pro 



Glu Ser 



Leu Leu 

15 

Tyr Ser 
210 

Gin Ser 
225 

20 Val He 



Tyr Gly 



Asp Arg 

25 

Asn Cys 
290 

Arg Asn 
305 



85 

Ala He Phe Gin Met 
100 

He Asn Val Asp Arg 
115 

His Leu Arg Arg Pro 

135 

Ala Leu He Leu Val 

150 

Ser Arg Cys Arg Tyr 
165 

Phe Ser Asp Glu Leu 
180 

Ala Glu Ala Leu Gly 
195 

Ser Gly Arg Val Phe 

215 

Gin Arg Arg Arg Lys 

230 

Phe Leu Leu Cys Phe 
245 

Leu Leu Arg Ser Lys 
260 

Val Arg Gly Val Leu 
275 

Val Leu Asp Pro Leu 

295 

Thr Leu Arg Gly Leu 

310 



6- 

90 

Asn Met Tyr Gly Ser 
105 

Tyr Ala Ala He Val 
120 

Arg Val Ala Arg Leu 

140 

Phe Ala Val Pro Ala 

155 

Arg Asp Leu Glu Val 
170 

Trp Lys Gly Arg Leu 
185 

Phe Leu Leu Pro Leu 
200 

Trp Thr Leu Ala Arg 

220 

Thr Val Arg Leu Leu 

235 

Val Pro Tyr Asn Ser 
250 

Leu Val Ala Ala Ser 
265 

Met Val Met Val Leu 
280 

Val Tyr Tyr Phe Ser 

300 

Gly Thr Pro His Arg 

315 



Cys He Phe Leu 
110 

His Pro Leu Arg 
125 

Leu Cys Leu Gly 



Ala Arg Val His 

160 

Arg Leu Cys Phe 
175 

Leu Pro Leu Val 
190 

Ala Ala Val Val 
205 

Pro Asp Ala Thr 



Leu Ala Asn Leu 

240 

Thr Leu Ala Val 
255 

Val Pro Ala Arg 
270 

Leu Ala Gly Ala 
285 

Ala Glu Gly Phe 



Ala Arg Thr Ser 

320 



30 Ala Thr Asn Gly Thr Arg Ala Ala Leu Ala Gin Ser Glu Arg Ser Ala 

325... 330 335 

Val Thr Thr Asp Ala Thr Arg Pro Asp Ala Ala Ser Gin Gly Leu Leu 

340 345 350 

Arg Pro Ser Asp Ser His Ser Leu Ser Ser Phe Thr Gin Cys Pro Gin 
35 355 360 365 



Asp Ser Ala Leu 
370 



WO 00/22131 



PCT/US99/24065 



-7- 

(6) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 1107 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

ATGGCCAACT CCACAGGGCT GAACGCCTCA GAAGTCGCAG GCTCGTTGGG GTTGATCCTG 6 0 

10 GCAGCTGTCG TGGAGGTGGG GGCACTG CTG GGCAACGGCG CGCTGCTGGT CGTGGTGCTG 12 0 

CGCACGCCGG GACTGCGCGA CGCGCTCTAC CTGGCGCACC TGTGCGTCGT GGAC CTGCTG 18 0 

GCGGCCGCCT CCATCATGCC GCTGGGCCTG CTGGCCGCAC CGCCGCCCGG GCTGGGCCGC 24 0 

GTGCGCCTGG GCCCCGCGCC ATGCCGCGCC GCTCGCTTCC TCTCCGCCGC TCTGCTGCCG 300 

GCCTGCACGC TCGGGGTGGC CGCACTTGGC CTGGCACGCT ACCGCCTCAT CGTGCACCCG 36 0 

15 CTGCGGCCAG GCTCGCGGCC GCCGCCTGTG CTCGTGCTCA CCGCCGTGTG GGCCGCGGCG 42 0 

GGACTGCTGG GCGCGCTCTC CCTGCTCGGC CCGCCGCCCG CACCGCCCCC TGCTCCTGCT 480 

CGCTGCTCGG TCCTGGCTGG GGGCCTCGGG CCCTTCCGGC CGCTCTGGGC CCTGCTGGCC 54 0 

TTCGCGCTGC CCGCCCTCCT GCTGCTCGGC GCCTACGGCG GCATCTTCGT GGTGGCGCGT 60 0 

CGCGCTGCCC TGAGGCCCCC ACGGCCGGCG CGCGGGTCCC GACTCCGCTC GGACTCTCTG 66 0 

20 GATAGCCGCC TTTCCATCTT GCCGCCGCTC CGGCCTCGCC TGCCCGGGGG CAAGGCGGCC 72 0 

CTGGCCCCAG CGCTGGCCGT GGG CCAATTT GCAGCCTGCT GGCTGCCTTA TGGCTGCGCG 78 0 

TGCCTGGCGC CCGCAGCGCG GGCCGCGGAA GCCGAAGCGG CTGTCAC CTG GGTCGCCTAC 84 0 

TCGGC CTTCG CGGCTCACCC CTTCCTGTAC GGG CTGCTGC AGCGCCCCGT GCGCTTGGCA 900 

CTGGGCCGCC TCTCTCGCCG TGCACTGCCT GGACCTGTGC GGGCCTGCAC TCCGCAAGCC 96 0 

25 TGGCACCCGC GGGCACTCTT GCAATG CCTC CAGAGACCCC CAGAGGGCCC TGCCGTAGGC 102 0 

CCTTCTGAGG CTCCAGAACA GACCCCCGAG TTGGCAGGAG GGCGGAGCCC CGCATACCAG 108 0 

GGGCCACCTG AGAGTTCTCT CTCCTGA 1107 
(7) INFORMATION FOR SEQ ID NO : 6 : 



30 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 368 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOIiOGY: not relevant 

(ii) MOLECULE TYPE: protein 



5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Met Ala Asn Ser Thr Gly Leu Asn Ala Ser Glu Val Ala Gly Ser Leu 
1 5 10 15 

Gly Leu lie Leu Ala Ala Val Val Glu Val Gly Ala Leu Leu Gly Asn 

20 25 30 

10 Gly Ala Leu Leu Val Val Val Leu Arg Thr Pro Gly Leu Arg Asp Ala 

35 40 45 

Leu Tyr Leu Ala His Leu Cys Val Val Asp Leu Leu Ala Ala Ala Ser 
50 55 60 

He Met Pro Leu Gly Leu Leu Ala Ala Pro Pro Pro Gly Leu Gly Arg 
15 65 70 75 80 

Val Arg Leu Gly Pro Ala Pro Cys Arg Ala Ala Arg Phe Leu Ser Ala 

85 90 95 

Ala Leu Leu Pro Ala Cys Thr Leu Gly Val Ala Ala Leu Gly Leu Ala 

100 105 110 

20 Arg Tyr Arg Leu He Val His Pro Leu Arg Pro Gly Ser Arg Pro Pro 

115 120 125 

Pro Val Leu Val Leu Thr Ala Val Trp Ala Ala Ala Gly Leu Leu Gly 
130 135 140 

Ala Leu Ser Leu Leu Gly Pro Pro Pro Ala Pro Pro Pro Ala Pro Ala 
25 145 150 155 160 

Arg Cys Ser Val Leu Ala Gly Gly Leu Gly Pro Phe Arg Pro Leu Trp 

165 170 175 

Ala Leu Leu Ala Phe Ala Leu Pro Ala Leu Leu Leu Leu Gly Ala Tyr 

180 185 190 



30 Gly Gly He Phe Val Val 

195 

Pro Ala Arg Gly Ser Arg 
210 

Ser He Leu Pro Pro Leu 
35 225 230 

Leu Ala Pro Ala Leu Ala 



Ala Arg Arg Ala Ala 
200 

Leu Arg Ser Asp Ser 
215 

Arg Pro Arg Leu Pro 

235 

Val Gly Gin Phe Ala 



Leu Arg Pro Pro Arg 
205 

Leu Asp Ser Arg Leu 
220 

Gly Gly Lys Ala Ala 

240 

Ala Cys Trp Leu Pro 
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245 250 255 

Tyr Gly Cys Ala Cys Leu Ala Pro Ala Ala Arg Ala Ala Glu Ala Glu 

260 265 270 

Ala Ala Val Thr Trp Val Ala Tyr Ser Ala Phe Ala Ala His Pro Phe 
275 280 285 

Leu Tyr Gly Leu Leu Gin Arg Pro Val Arg Leu Ala Leu Gly Arg Leu 
290 295 300 

Ser Arg Arg Ala Leu Pro Gly Pro Val Arg Ala Cys Thr Pro Gin Ala 
305 310 315 320 

Trp His Pro Arg Ala Leu Leu Gin Cys Leu Gin Arg Pro Pro Glu Gly 

325 330 335 

Pro Ala Val Gly Pro Ser Glu Ala Pro Glu Gin Thr Pro Glu Leu Ala 

340 345 350 

Gly Gly Arg Ser Pro Ala Tyr Gin Gly Pro Pro Glu Ser Ser Leu Ser 
355 360 365 

(8) INFORMATION FOR SEQ ID NO: 7: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1008 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 



ATGGAATCAT 


CTTTCTCATT 


TGGAGTGATC 


CTTGCTGTCC 


TGGCCTCCCT 


CATCATTGCT 


60 


ACTAACACAC 


TAGTGGCTGT 


GGCTGTGCTG 


CTGTTGATCC 


ACAAGAATGA 


TGGTGTCAGT 


120 


CTCTGCTTCA 


CCTTGAATCT 


GGCTGTGGCT 


GACACCTTGA 


TTGGTGTGGC 


CATCTCTGGC 


180 


CTACTCACAG 


ACCAGCTCTC 


CAGCCCTTCT 


CGGCCCACAC 


AGAAGACCCT 


GTGCAGCCTG 

• 


240 


CGGATGGCAT 


TTGTCACTTC 


CTCCGCAGCT 


GCCTCTGTCC 


TCACGGTCAT 


GCTGATCACC 


300 


TTTGACAGGT 


ACCTTGCCAT 


CAAGCAGCCC 


TTCCGCTACT 


TG AAG AT CAT 


GAGTGGGTTC 


360 


GTGGCCGGGG 


CCTGCATTGC 


CGGGCTGTGG 


TTAGTGTCTT 


ACCTCATTGG 


CTTCCTCCCA 


420 


CTCGGAATCC 


CCATGTTCCA 


GCAGACTGCC 


TACAAAGGGC 


AGTGCAGCTT 


CTTTGCTGTA 


480 


TTTCACCCTC 


ACTTCGTGCT 


GACCCTCTCC 


TGCGTTGGCT 


TCTTCCCAGC 


CATGCTCCTC 


540 


TTTGTCTTCT 


TCTACTGCGA 


CATGCTCAAG 


ATTGCCTCCA 


TGCACAGCCA 


GCAGATTCGA 


600 
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AAGATGGAAC ATGCAGGAGC CATGGCTGGA GGTTATCGAT CCCCACGGAC TCCCAGCGAC 660 

TTCAAAGCTC TCCGTACTGT GTCTGTTCTC ATTGGGAGCT TTGCTCTATC CTGGACCCCC 720 

TTCCTTATCA CTGGCATTGT GCAGGTGGCC TGCCAGGAGT GTCACCTCTA CCTAGTGCTG 780 

GAACGGTACC TGTGGCTGCT CGGCGTGGGC AACTCCCTGC TCAACCCACT CATCTATGCC 84 0 

5 TATTGGCAGA AGGAGGTGCG ACTGCAGCTC TACCACATGG CCCTAGGAGT GAAGAAGGTG 900 

CTCACCTCAT TCCTCCTCTT TCTCTCGGCC AGGAATTGTG GCCCAGAGAG GCCCAGGGAA 96 0 

AGTTC CTGTC ACATCGTCAC TATCTCCAGC TCAGAGTTTG ATGGCTAA 1008 

(9) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 335 amino acids 

(B) TYPE: amino acid 
<C) STRANDEDNESS : 
(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 

Met Glu Ser Ser Phe Ser Phe Gly Val lie Leu Ala Val Leu Ala Ser 
! 5 10 15 

Leu lie lie Ala Thr Asn Thr Leu Val Ala Val Ala Val Leu Leu Leu 

20 25 30 

20 He His Lys Asn Asp Gly Val Ser Leu Cys Phe Thr Leu Asn Leu Ala 

35 40 45 

Val Ala Asp Thr Leu He Gly Val Ala He Ser Gly Leu Leu Thr Asp 
50 55 60 

Gin Leu Ser Ser Pro Ser Arg Pro Thr Gin Lys Thr Leu Cys Ser Leu 
25 65 * 70 75 80 

Arg Met Ala Phe Val Thr Ser Ser Ala Ala Ala Ser Val Leu Thr Val 

85 90 95 

Met Leu He Thr Phe Asp Arg.Tyr Leu Ala He Lys Gin Pro Phe Arg 

100 105 HO 

30 Tyr Leu Lys He Met Ser Gly Phe Val Ala Gly Ala Cys He Ala Gly 

115 120 125 



Leu 
Met 



Trp Leu Val Ser Tyr Leu He Gly Phe Leu Pro Leu Gly He Pro 
130 135 140 



Phe Gin Gin Thr Ala Tyr Lys Gly Gin Cys Ser Phe Phe Ala Val 
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145 150 155 160 

phe His Pro His Phe Val Leu Thr Leu Ser Cys Val Gly Phe Phe Pro 

165 170 175 

Ala Met Leu Leu Phe Val Phe Phe Tyr Cys Asp Met Leu Lys lie Ala 
5 180 185 190 

Ser Met His Ser Gin Gin He Arg Lys Met Glu His Ala Gly Ala Met 
195 200 205 

Ala Gly Gly Tyr Arg Ser Pro Arg Thr Pro Ser Asp Phe Lys Ala Leu 
210 215 220 

10 Arg Thr Val Ser Val Leu He Gly Ser Phe Ala Leu Ser Trp Thr Pro 

225 230 235 240 

Phe Leu He Thr Gly He Val Gin Val Ala Cys Gin Glu Cys His Leu 

245 250 255 

Tyr Leu Val Leu Glu Arg Tyr Leu Trp Leu Leu Gly Val Gly Asn Ser 
15 260 265 270 

Leu Leu Asn Pro Leu He Tyr Ala Tyr Trp Gin Lys Glu Val Arg Leu 
275 280 285 

Gin Leu Tyr His Met Ala Leu Gly Val Lys Lys Val Leu Thr Ser Phe 
290 295 300 

20 Leu Leu Phe Leu Ser Ala Arg Asn Cys Gly Pro Glu Arg Pro Arg Glu 

305 310 315 320 

Ser Ser Cys His He Val Thr He Ser Ser Ser Glu Phe Asp Gly 

325 330 335 

(10) INFORMATION FOR SEQ ID NO: 9: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1413 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

ATGGACACTA CCATGGAAGC TGACCTGGGT GCCACTGGCC ACAGGCCCCG CACAG AG CTT 60 

GATGATGAGG ACTCCTACCC CCAAGGTGGC TGG GACACGG TCTTCCTGGT GGCCCTGCTG 120 

CTCCTTGGGC TGCCAGCCAA TGGGTTGATG GCGTGGCTGG CCGGCTCCCA GGCCCGGCAT 180 

35 GGAGCTGGCA CGCGTCTGGC GCTGCTCCTG CTCAGCCTGG CCCTCTCTGA CTTCTTGTTC 240 
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CTGGCAGCAG CGGCCTTCCA GATCCTAGAG ATCCGGCATG GGGGACACTG GCCGCTGGGG 300 

ACAGCTGCCT GCCGCTTCTA CTACTTCCTA TGGGGCGTGT CCTACTCCTC CGGCCTCTTC 360 

CTGCTGGCCG CCCTCAGCCT CGACCGCTGC CTGCTGGCGC TGTGCCCACA CTGGTACCCT 420 

GGGCACCGCC CAGTCCGCCT GCCCCTCTGG GTCTGCGCCG GTGTCTGGGT GCTGGCCACA 480 
CTCTTCAGCG TGCCCTGGCT GGTCTTCCCC GAGGCTGCCG TCTGGTGGTA CG AC CTGGTC 54 0 

ATCTGCCTGG ACTTCTGGGA CAG CG AGG AG CTGTCGCTGA GGATGCTGGA GGTCCTGGGG 600 
GGCTTCCTGC CTTTCCTCCT GCTGCTCGTC TGCCACGTGC TCACCCAGGC CACAGCC TGT 660 
CGCACCTGCC ACCGCCAACA GCAGCCCGCA GCCTGCCGGG GCTTCGCCCG TGTGGCCAGG 720 
ACCATTCTGT CAGCCTATGT GGTCCTGAGG CTGCCCTACC AGCTGGCCCA GCTGCTCTAC 780 
CTGGCCTTCC TGTGGGACGT CTACTC TGGC TAG CTGCTC T GGGAGGCCCT GGTCTACTCC 840 
GACTACCTGA TCCTACTCAA CAGCTGCCTC AGCCCCTTCC TCTGCCTCAT GGCCAGTGCC 900 
G AC CTCCGG A CCCTGCTGCG CTCCGTGCTC TCGTCCTTCG CGGCAGCTCT CTG CGAGGAG 960 

CGGCCGGGCA GCTTCACGCC CACTGAGCCA CAGACCCAGC TAGATTCTGA GGGTC CAACT 1020 

CTGCCAGAGC CGATGGCAGA GGCCCAGTCA CAGATGGATC CTGTGGC CC A GCCTCAGGTG 1080 

AACCCCACAC TCCAGCCACG ATCGGATCCC ACAG CTCAGC CACAGCTGAA CCCTACGGCC 114 0 

C AGC CACAGT CGGATCCCAC AGCCCAGCCA CAGCTGAACC TCATGGCCCA GCCACAGTCA 1200 

GATTCTGTGG CCCAGCCACA GGCAGACACT AACGTCCAGA CCCCTGCACC TGCTGCCAGT 126 0 

TCTGTGCCCA GTCCCTGTGA TGAAGCTTCC CCAACCCCAT CCTCGCATCC TACCCCAGGG 1320 

GCCCTTGAGG ACCCAGCCAC ACCTCCTGCC TCTGAAGGAG AAAGCCCCAG CAGCACCCCG 1380 

CCAGAGGCGG CCCCGGGCGC AGGCCCCACG TGA 1413 
(11) INFORMATION FOR SEQ ID NO:10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 468 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Asp Thr Thr Met Glu Ala Asp Leu Gly Ala Thr Gly His Arg Pro 
15 10 15 
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Arg Thr Glu Leu 

20 

Ttrr Val Phe Leu 
35 

Leu Met Ala Trp 
50 

Arg Leu Ala Leu 
65 

Leu Ala Ala Ala 



Trp Pro Leu Gly 

100 

Val Ser Tyr Ser 
115 

Arg Cys Leu Leu 
130 

Val Arg Leu Pro 
145 

Leu Phe Ser Val 



Tyr Asp Leu Val 

180 

Leu Arg Met Leu 
195 

Leu Val Cys His 
210 

Gin Gin Pro Ala 
225 

Leu Ser Ala Tyr 



Leu Tyr Leu Ala 

260 

Glu Ala Leu Val 
275 

Ser Pro Phe Leu 
290 

Arg Ser Val Leu 



-13- 

Asp Asp Glu Asp 



Val Ala Leu Leu 

40 

Leu Ala Gly Ser 
55 

Leu Leu Leu Ser 
70 

Ala Phe Gin lie 
85 

Thr Ala Ala Cys 



Ser Gly Leu Phe 

120 

Ala Leu Cys Pro 
135 

Leu Trp Val Cys 
150 

Pro Trp Leu Val 
165 

lie Cys Leu Asp 



Glu Val Leu Gly 

200 

Val Leu Thr Gin 
215 

Ala Cys Arg Gly 
230 

Val Val Leu Arg 
245 

Phe Leu Trp Asp 



Tyr Ser Asp Tyr 

260 

Cys Leu Met Ala 
295 

Ser Ser Phe Ala 



Ser Tyr Pro Gin 
25 

Leu Leu Gly Leu 



Gin Ala Arg His 

60 

Leu Ala Leu Ser 
75 

Leu Glu lie Arg 
90 

Arg Phe Tyr Tyr 
105 

Leu Leu Ala Ala 



His Trp Tyr Pro 

140 

Ala Gly Val Trp 
155 

Phe Pro Glu Ala 
170 

Phe Trp Asp Ser 
185 

Gly Phe Leu Pro 



Ala Thr Arg Thr 

220 

Phe Ala Arg Val 
235 

Leu Pro Tyr Gin 
250 

Val Tyr Ser Gly 
265 

Leu lie Leu Leu 



Ser Ala Asp Leu 

300 

Ala Ala Leu Cys 



Gly Gly Trp Asp 
30 

Pro Ala Asn Gly 
45 

Gly Ala Gly Thr 



Asp Phe Leu Phe 

80 

His Gly Gly His 
95 

Phe Leu Trp Gly 
110 

Leu Ser Leu Asp 
125 

Gly His Arg Pro 



Val Leu Ala Thr 

160 

Ala Val Trp Trp 
175 

Glu Glu Leu Ser 
190 

Phe Leu Leu Leu 
205 

Cys His Arg Gin 



Ala Arg Thr lie 

240 

Leu Ala Gin Leu 
255 

Tyr Leu Leu Trp 
270 

Asn Ser Cys Leu 
285 

Arg Thr Leu Leu 



Glu Glu Arg Pro 



WO 00/22131 PCT7US99/2406S 

-14- 

305 310 315 320 

Gly Ser Phe Thr Pro Thr Glu Pro Gin Thr Gin Leu Asp Ser Glu Gly 

325 330 335 

Pro Thx Leu Pro Glu Pro Met Ala Glu Ala Gin Ser Gin Met Asp Pro 

340 345 350 

Val Ala Gin Pro Gin Val Asn Pro Thr Leu Gin Pro Arg Ser Asp Pro 
355 360 365 

Thr Ala Gin Pro Gin Leu Asn Pro Thr Ala Gin Pro Gin Ser Asp Pro 
370 375 380 

Thr Ala Gin Pro Gin Leu Asn Leu Met Ala Gin Pro Gin Ser Asp Ser 
385 390 395 400 

Val Ala Gin Pro Gin Ala Asp Thr Asn Val Gin Thr Pro Ala Pro Ala 

405 410 415 

Ala Ser Ser Val Pro Ser Pro Cys Asp Glu Ala Ser Pro Thr Pro Ser 

420 425 430 

Ser His Pro Thr Pro Gly Ala Leu Glu Asp Pro Ala Thr Pro Pro Ala 
435 440 445 

Ser Glu Gly Glu Ser Pro Ser Ser Thr Pro Pro Glu Ala Ala Pro Gly 
450 455 460 

Ala Gly Pro Thr 
465 

(12) INFORMATION FOR SEQ ID NO:ll: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1248 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

ATGTCAGGGA TGGAAAAACT TCAGAATGCT TCCTGGATCT ACCAGCAGAA ACTAGAAGAT 60 

CCATTCCAGA AACAC CTGAA CAGCACCGAG GAGTATCTGG CCTTCCTCTG CGGACCTCGG 120 

CGCAGCCACT TCTTCCTCCC CGTGTCTGTG GTGTATGTGC CAATTTTTGT GGTGGGGGTC 180 

ATTGGCAATG TCCTGGTGTG CCTGGTGATT CTGCAGCACC AGGCTATGAA GACGCCCACC 240 

AAC TACT AC C TCTTCAGCCT GGCGGTCTCT GACCTCCTGG TCCTGCTCCT TGGAATGCCC 300 
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CTGGAGGTCT ATGAGATGTG GCGCAACTAC CCTTTCTTGT TCGGGCCCGT GGGCTGCTAC 360 

TTCAAGACGG CCCTCTTTGA GACCGTGTGC TTCGCCTCCA TCCTCAGCAT CACCACCGTC 42 0 

AGCGTGGAGC GCTACGTGGC CATCCTACAC CCGTTCCGCG CCAAACTGCA GAGCACCCGG 48 0 

CGCCGGC3CCC TCAGGATCCT CGGCATCGTC TGGGGCTTCT CCGTGCTCTT CTCCCTGCCC 54 0 

AACACCAGCA TCCATGGCAT CAAGTTCCAC TACTTCCCCA ATGGGTCCCT GGTCCCAGGT 600 

TCGGCCACCT GTACGGTCAT CAAGCCCATG TGGATCTACA ATTTCATCAT CCAGGTCACC 66 0 

TCCTTCCTAT TCTACCTCCT CCCCATGACT GTCATCAGTG TCCTCTACTA CCTCATGGCA 72 0 

CTCAGACTAA AGAAAGACAA ATCTCTTGAG GCAGATGAAG GGAATGCAAA TATTCAAAGA 780 

CCCTGCAGAA AATCAGTCAA CAAGATGCTG TTTGTCTTGG TCTTAGTGTT TGCTATCTGT 84 0 

TGGGCCCCGT TCCACATTGA CCGACTCTTC TTCAGCTTTG TGGAGGAGTG GAGTGAATCC 900 

CTGGCTGCTG TGTTCAACCT CGTCCATGTG GTGTCAGGTG TCTTCTTCTA CCTGAGCTCA 96 0 

GCTGTCAACC CCATTATCTA TAACCTACTG TCTCGCCGCT TCCAGGCAGC ATTCCAGAAT 102 0 

GTGATCTCTT CTTTCCACAA ACAGTGGCAC TCCCAGCATG ACCCACAGTT GCCACCTGCC 1080 

CAGCGGAACA TCTTCCTGAC AGAATGCCAC TTTGTGG AG C TGACCGAAGA TATAGGTCCC 1140 

CAATTCCCAT GTCAGTCATC CATGCACAAC TCTCACCTCC CAACAGCCCT CTCTAGTGAA 1200 

CAGATGTCAA G AACAAAC T A TCAAAGCTTC CACTTTAACA AAACCTGA 124 8 
(13) INFORMATION FOR SEQ ID NO: 12: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 415 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY : not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Ser Gly Met Glu Lys Leu Gin Asn Ala Ser Trp lie Tyr Gin Gin 
1 5 10 15 

Lys Leu Glu Asp Pro Phe Gin Lys His Leu Asn Ser Thr Glu Glu Tyr 

20 25 30 

Leu Ala Phe Leu Cys Gly Pro Arg Arg Ser His Phe Phe Leu Pro Val 
35 40 45 



Ser 



Val Val Tyr Val Pro lie Phe Val Val Gly Val lie Gly Asn Val 
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50 

Leu Val Cys Leu 
6 5 

Asn Tyr Tyr Leu 

5 

Leu Gly Met Pro 

100 



55 

Val lie Leu Gin 
70 

Phe Ser Leu Ala 
85 

Leu Glu Val Tyr 



60 

His Gin Ala Met 
75 

Val Ser Asp Leu 
90 

Glu Met Trp Arg 
105 



Lys Thr Pro Thr 

80 

Leu Val Leu Leu 
95 

Asn Tyr Pro Phe 



Leu Phe 



10 Val Cys 

130 

Tyr Val 
145 

Arg Arg 

15 

phe Ser 



pro Asn 



20 Pro Met 

210 

Tyr Leu 
225 

Leu Arg 

25 

Asn lie 



Gly Pro Val Gly Cys 
115 

Phe Ala Ser lie Leu 

135 

Ala lie Leu His Pro 

150 

Ala Leu Arg lie Leu 
165 

Leu Pro Asn Thr Ser 
180 

Gly Ser Leu Val Pro 
195 

Trp lie Tyr Asn Phe 

215 

Leu Pro Met Thr Val 

230 

Leu Lys Lys Asp Lys 
245 

Gin Arg Pro Cys Arg 
260 



Tyr Phe Lys Thr 
120 

Ser lie Thr Thr 



Phe Arg Ala Lys 

155 

Gly lie Val Trp 
170 

lie His Gly lie 
185 

Gly Ser Ala Thr 
200 

lie He Gin Val 



He Ser Val Leu 

235 

Ser Leu Glu Ala 
250 

Lys Ser Val Asn 
265 



Ala Leu Phe Glu Thr 
125 

Val Ser Val Glu Arg 
140 

Leu Gin Ser Thr Arg 

160 

Gly Phe Ser Val Leu 

175 

Lys Phe His Tyr Phe 
190 

Cys Thr Val lie Lys 
205 

Thr Ser Phe Leu Phe 
220 

Tyr Tyr Leu Met Ala 

240 

Asp Glu Gly Asn Ala 

255 

Lys Met Leu Phe Val 
270 



Leu Val Leu Val 
275 

30 Leu Phe Phe Ser 

290 

Phe Asn Leu Val 
305 

Ala Val Asn Pro 

35 

Ala Phe Gin Asn 

340 



Phe Ala He Cys 

280 

Phe Val Glu Glu 
295 

His Val Val Ser 
310 

He He Tyr Asn 
325 

Val He Ser Ser 



Trp Ala Pro Phe 



Trp Ser Glu Ser 

300 

Gly Val Phe Phe 
315 

Leu Leu Ser Arg 
330 

Phe His Lys Gin 
345 



His He Asp Arg 
285 

Leu Ala Ala Val 



Tyr Leu Ser Ser 

320 

Arg Phe Gin Ala 
335 

Trp His Ser Gin 
350 
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His Asp Pro Gin Leu Pro Pro Ala Gin Arg Asn lie Phe Leu Thr Glu 
355 360 365 

Cys His Phe Val Glu Leu Thr Glu Asp lie Gly Pro Gin Phe Pro Cys 
370 375 380 

5 Gin Ser Ser Met His Asn Ser His Leu Pro Thr Ala Leu Ser Ser Glu 

385 390 395 400 

Gin Met Ser Arg Thr Asn Tyr Gin Ser Phe His Phe Asn Lys Thr 

405 410 415 

(14) INFORMATION FOR SEQ ID NO: 13: 

10 (i) SEQUENCE CHARACTERISTICS: 

. (A) LENGTH: 1173 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
ATGCCAGATA CTAATAGCAC AATCAATTTA TCACTAAGCA CTCGTGTTAC TTT AG CATTT 6 0 

TTTATGTCCT TAGTAGCTTT TGCTATAATG CTAGGAAATG CTTTGGTCAT TTTAG CTTTT 120 

GTGGTGGACA AAAACCTTAG AC ATCGAAG T AGTTATTTTT TTCTTAACTT GGCCATCTCT 180 

20 GACTTCTTTG TGGGTGTGAT CTCCATTCCT TTGTACATCC CTCACACGCT GTTCGAATGG 24 0 

GATTTTGGAA AGGAAATCTG TGTATTTTGG CTCACTACTG ACTATCTGTT ATGTACAGCA 300 

TCTGTATATA ACATTGTCCT CATCAGCTAT GATCGATACC TGTCAGTCTC AAATGCTGTG 36 0 

TCTTATAGAA CT CAACATAC TGGGGTCTTG AAGATTGTTA CTCTGATGGT GGCCGTTTGG 42 0 

GTGCTGGCCT TCTTAGTGAA TGGGCCAATG ATTCTAGTTT CAGAG TCTTG GAAGGATGAA 4 80 

25 GGTAGTGAAT GTGAACCTGG ATTTTTTTCG GAATGGTACA TCCTTGCCAT CACATCATTC 540 

TTGGAATTCG TGATCCCAGT CATCTTAGTC GCTTATTTCA ACATGAATAT TTATTGGAGC 600 

CTGTGGAAGC GTGATCATCT CAGTAGGTGC CAAAGCCATC CTGGACTGAC TGCTGTCTCT 6 60 

TCCAACATCT GTGGACACTC ATTCAGAGGT AGACTATCTT CAAGGAGATC TCTTTCTGCA 7 20 

TCGACAGAAG TTCCTGCATC CTTTCATTCA GAGAGACAGA G G AG AAAG AG TAGTCTCATG 7 80 

30 TTTTC CTCAA GAACCAAGAT GAATAGCAAT ACAATTGCTT CCAAAATGGG TTCCTTCTCC 8 40 

CAATCAGATT CTGTAGCTCT TCACCAAAGG GAACATGTTG AACTGCTTAG AG C C AGG AG A 900 
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TTAGCCAAGT CACTGG CCAT TCTCTTAGGG GTTTTTGCTG TTTGCTGGGC TCCATATTCT 960 

CTGTTCACAA TTGTCCTTTC ATTTTATTCC TCAGCAACAG GTCCTAAATC AGTTTGGTAT 102 0 

AGAATTGCAT TTTGGCTTCA GTGGTTCAAT TCCTTTGTCA ATCCTCTTTT GTATCCATTG 1080 

TGTCACAAGC G CTTTC AAAA GGCTTTCTTG AAAATATTTT GTATAAAAAA G CAAC CTCTA 114 0 

CCATCACAAC ACAGTCGGTC AGTATCTTCT TAA H73 
(15) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 390 amino. acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

<D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Pro Asp Thr Asn Ser Thr He Asn Leu Ser Leu Ser Thr Arg Val 
1 5 10 15 

Thr Leu Ala Phe Phe Met Ser Leu Val Ala Phe Ala He Met Leu Gly 

20 25 30 

Asn Ala Leu Val He Leu Ala Phe Val Val Asp Lys Asn Leu Arg His 
35 40 45 

Arg Ser Ser Tyr Phe Phe Leu Asn Leu Ala He Ser Asp Phe Phe Val 
50 55 60 

Gly Val He Ser He Pro Leu Tyr He Pro His Thr Leu Phe Glu Trp 
65 70 75 80 

Asp Phe Gly Lys Glu He Cys Val Phe Trp Leu Thr Thr Asp Tyr Leu 

85 90 95 

Leu Cys Thr Ala Ser Val Tyr Asn He Val Leu He Ser Tyr Asp Arg 

100 105 HO 

Tyr Leu Ser Val Ser Asn Ala Val Ser Tyr Arg Thr Gin His Thr Gly 
115 120 125 

Val Leu Lys He Val Thr Leu Met Val Ala Val Trp Val Leu Ala Phe 
130 135 140 

Leu Val Asn Gly Pro Met He Leu Val Ser Glu Ser Trp Lys Asp Glu 
145 150 155 160 

Gly Ser Glu Cys Glu Pro Gly Phe Phe Ser Glu Trp Tyr He Leu Ala 

165 170 175 
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lie Thr Ser Phe Leu 

180 

Phe Asn Met Asn lie 
195 

Arg Cys Gin Ser His 
210 

Gly His Ser Phe Arg 
225 

Ser Thr Glu Val Pro 

245 

Ser Ser Leu Met Phe 

260 

Ala Ser Lys Met Gly 
275 

Gin Arg Glu His Val 
290 

. Leu Ala lie Leu Leu 
305 

Leu Phe Thr He Val 

325 

Ser Val Trp Tyr Arg 

340 

Val Asn Pro Leu Leu 
355 

Phe Leu Lys He Phe 
370 
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Glu Phe Val He Pro Val 

185 

Tyr Trp Ser Leu Trp Lys 
200 

Pro Gly Leu Thr Ala Val 
215 

Gly Arg Leu Ser Ser Arg 
230 235 

Ala Ser Phe His Ser Glu 

250 

Ser Ser Arg Thr Lys Met 

265 

Ser Phe Ser Gin Ser Asp 
280 

Glu Leu Leu Arg Ala Arg 
295 

Gly Val Phe Ala Val Cys 
310 315 

Leu Ser Phe Tyr Ser Ser 

330 

lie Ala Phe Trp Leu Gin 

345 

Tyr Pro Leu Cys His Lys 
360 

Cys He Lys Lys Gin Pro 
375 



He Leu Val Ala Tyr 
190 

Arg Asp His Leu Ser 
205 

Ser Ser Asn He Cys 
220 

Arg Ser Leu Ser Ala 

240 

Arg Gin Arg Arg Lys 

255 

Asn Ser Asn Thr He 
270 

Ser Val Ala Leu His 
285 

Arg Leu Ala Lys Ser 
300 

Trp Ala Pro Tyr Ser 

320 

Ala Thr Gly Pro Lys 

335 

Trp Phe Asn Ser Phe 
350 

Arg Phe Gin Lys Ala 
365 

Leu Pro Ser Gin His 
380 



Ser Arg Ser Val Ser Ser 
385 390 

(16) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(iv) ANTI-SENSE: NO 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
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GGAAAGCTTA ACGATCCCCA GGAGCAACAT 30 

(17) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iv) ANTI-SENSE: YES 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

CTGGGATCCT ACGAG AG CAT TTTTCACACA G 
31 

(18) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 
. 15 (A) LENGTH: 1128 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

ATGGCGAACG CG AG CG AG C C GGGTGGCAGC GGCGGCGGCG AGGCGGCCGC CCTGGGCCTC 60 

AAGCTGGCCA CGCTCAGCCT GCTGCTGTGC G TGAG C C TAG CGGGCAACGT GCTGTTCGCG 120 

CTGCTGATCG TGCGGGAGCG CAGCCTGCAC CGCGCCCCGT ACTACCTGCT GCTCGACCTG 180 

TGCCTGGCCG ACGGGCTGCG CGCGCTCGCC TGCCTCCCGG CCGTCATGCT GGCGGCGCGG 24 0 

25 CGTGCGGCGG CCGCGGCGGG GGCGCCGCCG GGCG CGCTGG GCTGCAAGCT GCTCGCCTTC 300 

CTGGCCGCGC TCTTCTGCTT CCACGCCGCC TTCCTG CTGC TGGGCGTGGG CGTCACCCGC 360 

TACCTGGCCA TCGCGCACCA CCGCTTCTAT GCAGAGCGCC TGGCCGGCTG GCCGTGCGCC 42 0 

GCCATGCTGG TGTGCGCCGC CTGGGCGCTG GCGCTGGCCG CGGCCTTCCC GCCAGTGCTG 48 0 

GACGG CGGTG GCGACGACGA GGACGCGCCG TGCGCCCTGG AGCAGCGGCC CGACGGCGCC 54 0 

30 CCCGGCGCGC TGGGCTTCCT GCTGCTGCTG GCCGTGGTGG TGGGCGCCAC GCACCTCGTC 6 00 



TACCTCCGCC TGCTCTTCTT CATCCACGAC CGCCGCAAGA TGCGGCCCGC GCGCCTGGTG 660 



WO 00/22131 PCT/US99/24065 

-21- 

CCCGCCGTCA GCCACGACTG GACCTTCCAC GGCCCGGGCG CCACCGGCCA GGCGGCCGCC 720 

AACTGGACGG CGGGCTTCGG CCGCGGGCCC ACGCCGCCCG CGCTTGTGGG CATCCGGCCC 780 

GCAGGGCCGG GCCGCGGCGC GCGCCGCCTC CTCGTGCTGG AAGAATTCAA GACGGAGAAG 84 0 

AGGCTGTGCA AGATGTTCTA CGCCGTCACG CTGCTCTTCC TGCTCCTCTG GGGGCCCTAC 900 

GTCGTGGCCA GCTACCTGCG GGTCCTGGTG CGGCCCGGCG CCGTCCCCCA GGCCTACCTG 960 

ACGGCCTCCG TGTGG CTGAC CTTCGCGCAG GCCGGCATCA ACCCCGTCGT GTGCTTCCTC 1020 

TTCAACAGGG AGCTGAGGGA CTG CTTCAGG GCCCAGTTCC CCTGCTGCCA GAGCCCCCGG 108 0 

ACCACCCAGG CGACCCATCC CTG CG AC CTG AAAGG CATTG GTTTATGA 1128 
(19) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 375 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

( X i) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Ala Asn Ala Ser Glu Pro Gly Gly Ser Gly Gly Gly Glu Ala Ala 
x 5 10 15 

Ala Leu Gly Leu Lys Leu Ala Thr Leu Ser Leu Leu Leu Cys Val Ser 

20 25 30 

Leu Ala Gly Asn Val Leu Phe Ala Leu Leu lie Val Arg Glu Arg Ser 
35 40 45 

Leu His Arg Ala Pro Tyr Tyr Leu Leu Leu Asp Leu Cys Leu Ala Asp 
50 55 60 

Gly Leu Arg Ala Leu Ala Cys Leu Pro Ala Val Met Leu Ala Ala Arg 
65 70 75 80 

Arg Ala Ala Ala Ala Ala Gly Ala Pro Pro Gly Ala Leu Gly Cys Lys 

85 90 95 

Leu Leu Ala Phe Leu Ala Ala Leu Phe Cys Phe His Ala Ala Phe Leu 

100 105 HO 

Leu Leu Gly Val Gly Val Thr Arg Tyr Leu Ala lie Ala His His Arg 
115 120 125 

Phe Tyr Ala Glu Arg Leu Ala Gly Trp Pro Cys Ala Ala Met Leu Val 
130 135 140 
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Cys Ala Ala Trp Ala Leu Ala Leu Ala Ala Ala Phe Pro Pro Val Leu 
X 45 150 155 160 

Asp Gly Gly Gly Asp Asp Glu Asp Ala Pro Cys Ala Leu Glu Gin Arg 

165 170 175 

5 pro Asp Gly Ala Pro Gly Ala Leu Gly Phe Leu Leu Leu Leu Ala Val 

180 185 190 

Val Val Gly Ala Thr His Leu Val Tyr Leu Arg Leu Leu Phe Phe lie 
195 200 205 

His Asp Arg Arg Lys Met Arg Pro Ala Arg Leu Val Pro Ala Val Ser 
10 210 215 220 

His Asp Trp Thr Phe His Gly Pro Gly Ala Thr Gly Gin Ala Ala Ala 
225 230 235 240 

Asn Trp Thr Ala Gly Phe Gly Arg Gly Pro Thr Pro Pro Ala Leu Val 

245 250 255 

15 Gly He Arg Pro Ala Gly Pro Gly Arg Gly Ala Arg Arg Leu Leu Val 

260 265 270 

Leu Glu Glu Phe Lys Thr Glu Lys Arg Leu Cys Lys Met Phe Tyr Ala 
275 2B0 285 

Val Thr Leu Leu Phe Leu Leu Leu Trp Gly Pro Tyr Val Val Ala Ser 
20 290 295 300 

Tyr Leu Arg Val Leu Val Arg Pro Gly Ala Val Pro Gin Ala Tyr Leu 
305 310 315 320 

Thr Ala Ser Val Trp Leu Thr Phe Ala Gin Ala Gly He Asn Pro Val 

325 330 335 

25 val Cys Phe Leu Phe Asn Arg Glu Leu Arg Asp Cys Phe Arg Ala Gin 

340 345 350 

Phe Pro Cys Cys Gin Ser Pro Arg Thr Thr Gin Ala Thr His Pro Cys 
355 360 365 

Asp Leu Lys Gly He Gly Leu 
30 370 375 

(20) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1002 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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( X i) SEQUENCE DESCRIPTION: SEQ ID NO:19: 

ATGAACACCA CAGTGATGCA AGGCTTCAAC AGATCTGAGC GGTGCCCCAG AGACACTCGG 6 0 

ATAGTACAGC TGGTATTCCC AGCCCTCTAC ACAGTGGTTT TCTTGACCGG CATC CTGCTG 12 0 

AATACTTTGG CTCTGTGGGT GTTTGTTCAC ATCCCCAGCT CCTCCACCTT CATCATCTAC 18 0 

CTCAAAAACA CTTTGGTGGC CGACTTGATA ATGACACTCA TGCTTCCTTT CAAAATCCTC 24 0 

TCTGACTCAC ACCTGGCACC CTGGCAGCTC AGAGCTTTTG TGTGTCGTTT TTCTTCGGTG 3 00 

ATATTTTATG AGACCATGTA TGTGGGCATC GTGCTGTTAG GGCTCATAGC CTTTGACAGA 3 60 

TTCCTCAAGA T CAT CAGACC TTTGAGAAAT ATTTTTCTAA AAAAAC CTGT TTTTGCAAAA 42 0 

ACGGTCTCAA TCTTCATCTG GTTCTTTTTG TTCTTCATCT CCCTGCCAAA TACGATCTTG 4 80 

AG CAACAAGG AAGCAACACC ATCGTCTGTG AAAAAGTGTG CTTCCTTAAA GGGGCCTCTG 54 0 

GGGCTGAAAT GGCATCAAAT GGTAAATAAC ATATGCCAGT TTATTTT CTG G AC TGTTTTT 60 0 

AT C CTAATGC TTGTGTTTTA TGTGGTTATT GCAAAAAAAG TATATGATTC TTATAGAAAG 66 0 

TCCAAAAGTA AGGACAGAAA AAACAACAAA AAGCTGGAAG GCAAAGTATT TGTTGTCGTG 72 0 

GCTGTCTTCT TTGTGTGTTT TGCTCCATTT C ATTTTG CCA GAGTTCCATA T ACT C ACAG T 78 0 

CAAACCAACA ATAAGACTGA CTGTAGACTG CAAAATCAAC TGTTTATTGC TAAAGAAACA 84 0 

ACTCTCTTTT TGGCAGCAAC TAACATTTGT ATGGATCCCT TAATATACAT ATTCTTATGT 90 0 

AAAAAATTCA CAGAAAAGCT ACCATGTATG CAAGGGAGAA AGACCACAGC ATCAAGCCAA 96 0 

G AAAAT CAT A GCAGTCAGAC AGACAACATA AC CTTAGGCT GA 100 2 
(21) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

Met Asn Thr Thr Val Met Gin Gly Phe Asn Arg Ser Glu Arg Cys Pro 
1 5 10 15 

Arg Asp Thr Arg lie Val Gin Leu Val Phe Pro Ala Leu Tyr Thr Val 

20 25 30 
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Val Phe Leu Thr 
35 

Val His He Pro 
50 

Leu Val Ala Asp 
65 

Ser Asp Ser His 



Phe Ser Ser Val 

100 

Leu Gly Leu He 
115 

Arg Asn He Phe 
130 

phe He Trp Phe 
145 

Ser Asn Lys Glu 



Lys Gly Pro Leu 

180 

Gin Phe He Phe 
195 

Val He Ala Lys 
210 

Asp Arg Lys Asn 
225 

Ala Val Phe Phe 



Tyr Thr His Ser 

260 

Gin Leu Phe He 
275 

lie Cys Met Asp 
290 

Glu Lys Leu Pro 
305 

Glu Asn His Ser 
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Gly He Leu Leu 

40 

Ser Ser Ser Thr 
55 

Leu He Met Thr 
70 

Leu Ala Pro Trp 
85 

He Phe Tyr Glu 



Ala Phe Asp Arg 

120 

Leu Lys Lys Pro 
135 

Phe Leu Phe Phe 
150 

Ala Thr Pro Ser 
165 

Gly Leu Lys Trp 



Trp Thr Val Phe 

200 

Lys Val Tyr Asp 
215 

Asn Lys Lys Leu 
230 

Val Cys Phe Ala 
245 

Gin Thr Asn Asn 



Ala Lys Glu Thr 

280 

Pro Leu He Tyr 
295 

Cys Met Gin Gly 
310 

Ser Gin Thr Asp 



Asn Thr Leu Ala 



Phe He He Tyr 

60 

Leu Met Leu Pro 
75 

Gin Leu Arg Ala 
90 

Thr Met Tyr Val 
105 

Phe Leu Lys He 



Val Phe Ala Lys 

14 0 

He Ser Leu Pro 
155 

Ser Val Lys Lys 
170 

His Gin Met Val 
185 

He Leu Met Leu 



Ser Tyr Arg Lys 

220 

Glu Gly Lys Val 
235 

Pro Phe His Phe 
250 

Lys Thr Asp Cys 
265 

Thr Leu Phe Leu 



He Phe Leu Cys 

300 

Arg Lys Thr Thr 
315 

Asn He Thr Leu 



Leu Trp Val Phe 
45 

Leu Lys Asn Thr 



Phe Lys He Leu 

80 

Phe Val Cys Arg 
95 

Gly He Val Leu 
110 

He Arg Pro Leu 
125 

Thr Val Ser He 



Asn Thr lie Leu 

160 

Cys Ala Ser Leu 
175 

Asn Asn He Cys 
190 

Val Phe Tyr Val 
205 

Ser Lys Ser Lys 



Phe Val Val Val 

240 

Ala Arg Val Pro 
255 

Arg Leu Gin Asn 
270 

Ala Ala Thr Asn 
285 

Lys Lys Phe Thr 



Ala Ser Ser Gin 

320 

Gly 
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325 330 
(22) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1122 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 








ATGGCCAACA 


CTACCGGAGA 


G CCTGAGG AG 


GTGAGCGGCG 


CTCTGTCCCC 


ACCGTCCGCA 


60 


TCAGCTTATG 


TGAAGCTGGT 


ACTGCTGGGA 


C TGATTATGT 


GCGTGAGCCT 


GGCGGGTAAC 


120 


GCCATCTTGT 


CCCTGCTGGT 


GCTCAAGGAG 


CGTGCCCTGC 


ACAAGGCTCC 


TTACTACTTC 


180 


CTGCTGGACC 


TGTGC CTGGC 


CGATGGCATA 


CGCTCTGCCG 


TCTGCTTCCC 


CTTTGTGCTG 


240 


GCTTCTGTGC 


GCCACGGCTC 


TTCATGGACC 


TTCAGTGCAC 


TCAGCTGCAA 


GATTGTGGCC 


300 


TTTATGGCCG 


TGCTCTTTTG 


CTTCCATGCG 


GCCTTC ATG v_ 


TGTTCTGCAT 


CAGCGTCACC 


360 


CGCTACATGG 


CCATCGCCCA 


CCACCGCTTC 


1 AC Li L. L~A/\vi\. 


GCATGACACT 


CTGGACATGC 


420 


GCGGCTGTCA 


TCTGCATGGC 


CTGGACCCTG 


TCTGTGGCCA 


TGGCCTTCCC 


ACCTGTCTTT 


480 


GACGTGGGCA 


CCTACAAGTT 


TATTCGGGAG 


GAGGACCAGT 


GCATCTTTGA 


GCATCGCTAC 


540 


TTCAAGGCCA 


ATGACACGCT 


GGGCTTCATG 


CTTATGTTGG 


CTGTGCTCAT 


GGCAGCTACC 


600 


CATGCTGTCT 


ACGGCAAGCT 


GCTCCTCTTC 


GAGTATCGTC 


ACCGCAAGAT 


GAAGCCAGTG 


660 


CAGATGGTGC 


CAGCCATCAG 


CCAGAACTGG 


AC ATT CC ATG 


GTCCCGGGGC 


CACCGGCCAG 


720 


GCTGCTGCCA 


ACTGGATCGC 


CGG CTTTGGC 


CGTGGGCCCA 


TGCCACCAAC 


CCTGCTGGGT 


780 


ATC CGGCAG A 


ATGGGCATGC 


AGCCAGCCGG 


CGGCTACTGG 


GCATGGACGA 


GGTCAAGGGT 


840 


GAAAAGCAGC 


TGGGCCGCAT 


GTTCTACGCG 


ATCACACTGC 


TCTTTCTGCT 


CCTCTGGTCA 


900 


CCCTACATCG 


TGGC CTGCT A 


CTGGCGAGTG 


TTTGTGAAAG 


CCTGTGCTGT 


GCCCCACCGC 


960 


TACCTGGCCA 


CTGCTGTTTG 


GATGAGCTTC 


GCCCAGGCTG 


CCGTCAACCC 


AATTGTCTGC 


1020 


TTCCTGCTCA 


ACAAGGACCT 


CAAGAAGTGC 


CTGACCACTC 


ACGCCCCCTG 


CTGGGGCACA 


1080 


GGAGGTGCCC 


CGGCTCCCAG 


AGAACCCTAC 


TGTG T CATGT 


GA 




1122 



(23) INFORMATION FOR SEQ ID NO: 22: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 373 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

Met Ala Asn Thr Thr Gly Glu Pro Glu Glu Val Ser Gly Ala Leu Ser 
1 5 10 15 

Pro Pro Ser Ala Ser Ala Tyr Val Lys Leu Val Leu Leu Gly Leu lie 

20 25 30 

Met Cys Val Ser Leu Ala Gly Asn Ala He Leu Ser Leu Leu Val Leu 
35 40 45 

Lys Glu Arg Ala Leu His Lys Ala Pro Tyr Tyr Phe Leu Leu Asp Leu 
50 55 60 

cys Leu Ala Asp Gly He Arg Ser Ala Val Cys Phe Pro Phe Val Leu 
65 70 75 80 

Ala Ser Val Arg His Gly Ser Ser Trp Thr Phe Ser Ala Leu Ser Cys 

85 90 95 

Lys He Val Ala Phe Met Ala Val Leu Phe Cys Phe His Ala Ala Phe 

100 105 HO 

Met Leu Phe Cys He Ser Val Thr Arg Tyr Met Ala He Ala His. His 
115 120 125 

Arg Phe Tyr Ala Lys Arg Met Thr Leu Trp Thr Cys Ala Ala Val He 
130 135 140 

Cys Met Ala Trp Thr Leu Ser Val Ala Met Ala Phe Pro Pro Val Phe 
145 150 155 160 

Asp Val Gly Thr Tyr Lys Phe He Arg Glu Glu Asp Gin Cys He Phe 

165 170 175 

Glu His Arg Tyr Phe Lys Ala Asn Asp Thr Leu Gly Phe Met Leu Met 

180 185 190 

Leu Ala Val Leu Met Ala Ala Thr His Ala Val Tyr Gly Lys Leu Leu 
195 200 205 

Leu Phe Glu Tyr Arg Hia Arg Lys Met Lys Pro Val Gin Met Val Pro 
210 215 220 

Ala He Ser Gin Asn Trp Thr Phe His Gly Pro Gly Ala Thr Gly Gin 
225 230 235 240 



WO 00/22131 PCT/US99/24065 

-27- 

A la Ala Ala Asn Trp lie Ala Gly Phe Gly Arg Gly Pro Met Pro Pro 

245 250 255 

Thr Leu Leu Gly He Arg Gin Asn Gly His Ala Ala Ser Arg Arg Leu 

260 265 270 

Leu Gly Met Asp Glu Val Lys Gly Glu Lys Gin Leu Gly Arg Met Phe 
275 280 285 

Tyr Ala He Thr Leu Leu Phe Leu Leu Leu Trp Ser Pro Tyr He Val 
290 295 300 

Ala Cys Tyr Trp Arg Val Phe Val Lys Ala Cys Ala Val Pro His Arg 
305 310 315 320 

Tyr Leu Ala Thr Ala Val Trp Met Ser Phe Ala Gin Ala Ala Val Asn 

325 330 335 

Pro lie Val Cys Phe Leu Leu Asn Lys Asp Leu Lys Lys Cys Leu Thr 

340 345 350 

Thr His Ala Pro Cys Trp Gly Thr Gly Gly Ala Pro Ala Pro Arg Glu 
355 360 365 

pro Tyr Cys Val Met 
370 

(24) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1053 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

ATGGCTTTGG AACAGAACCA GTCAACAGAT TATTATTATG AGGAAAATGA AATGAATGGC 60 

ACTTATGACT ACAGTCAATA TGAATTGATC TGTATCAAAG AAGATGTCAG AGAATTTGCA 120 

AAAGTTTTCC TCCCTGTATT CCTCACAATA GCTTTCGTCA TTGGACTTGC AGGCAATTCC 18 0 

ATGGTAGTGG CAATTTATGC CTATTACAAG AAACAGAGAA CCAAAACAGA TGTGTAC AT C 24 0 

CTGAATTTGG CTGTAG CAG A TTTACTCCTT CTATTCACTC TGCCTTTTTG GGCTGTTAAT 30 0 

GCAGTTCATG GGTGGGTTTT AGGGAAAATA ATGTGCAAAA TAACTTCAGC CTTGTACACA 360 

CTAAACTTTG TCTCTGGAAT GCAGTTTCTG GCTTGCATCA G CAT AG ACAG ATATGTGGCA 42 0 

GTAACTAATG TCCCCAGCCA ATCAGGAGTG GGAAAACCAT GCTGGATCAT CTGTTTCTGT 48 0 
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GTCTGGATGG CTGCCATCTT G CTGAG CAT A CCCCAGCTGG TTTTTTATAC AGTAAATGAC 54 0 

AATGCTAGGT GCATTCCCAT TTTCCCCCGC TACGTAGGAA CATCAATGAA AG C ATTGATT 600 

CAAATGCTAG AGATCTG C AT TGGATTTGTA GTACCCTTTC TTATTATGGG GGTGTGCTAC 660 

TTTATCACGG CAAGGACACT CATGAAGATG CCAAACATTA AAATATCTCG AC C CCTAAAA 720 

GTTCTGCTCA CAGTCGTTAT AGTTTTCATT GT CACTCAAC TGCCTTATAA CATTGTCAAG 78 0 

TTCTGCCGAG C CAT AGACAT CATCTACTCC CTGATCACCA GCTGCAACAT GAGCAAACGC 84 0 

ATGGACATCG C CATC CAAGT C AC AG AAAG C ATTGCACTCT TTCACAGCTG CCTCAACCCA 900 

ATCCTTTATG TTTTTATGGG AGCATCTTTC AAAAACTACG TTATGAAAGT GGCCAAGAAA 960 

TATGGGTCCT GGAGAAGACA GAGACAAAGT GTGGAGGAGT TTCCTTTTGA TTCTGAGGGT 102 0 

CCTACAGAGC CAACCAGTAC TTTTAGCATT TAA 10 53 
(25) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 50 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Met Ala Leu Glu Gin Asn Gin Ser Thr Asp Tyr Tyr Tyr Glu Glu Asn 
x 5 10 15 

Glu Met Asn Gly Thr Tyr Asp Tyr Ser Gin Tyr Glu Leu lie Cys lie 

20 25 30 

Lys Glu Asp Val Arg Glu Phe Ala Lys Val Phe Leu Pro Val Phe Leu 
35 40 45 

Thr lie Ala Phe Val lie Gly Leu Ala Gly Asn Ser Met Val Val Ala 
50 55 60 

lie Tyr Ala Tyr Tyr Lys Lys Gin Arg Thr Lys Thr Asp Val Tyr - lie 
65 70 75 80 

Leu Asn Leu Ala Val Ala Asp Leu Leu Leu Leu Phe Thr Leu Pro Phe 

85 90 95 

Trp Ala Val Asn Ala Val His Gly Trp Val Leu Gly Lys lie Met Cys 

100 105 HO 

Lys He Thr Ser Ala Leu Tyr Thr Leu Asn Phe Val Ser Gly Met Gin 
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115 120 125 

Phe Leu Ala Cys lie Ser lie Asp Arg Tyr Val Ala Val Thr Asn Val 
130 135 140 

Pro Ser Gin Ser Gly Val Gly Lys Pro Cys Trp lie lie Cys Phe Cys 
14 5 150 155 160 

Val Trp Met Ala Ala lie Leu Leu Ser lie Pro Gin Leu Val Phe Tyr 

165 170 175 

Thr Val Asn Asp Asn Ala Arg Cys He Pro He Phe Pro Arg Tyr Leu 

180 185 190 

Gly Thr Ser Met Lys Ala Leu He Gin Met Leu Glu He Cys He Gly 
195 200 205 

Phe Val Val Pro Phe Leu He Met Gly Val Cys Tyr Phe He Thr Ala 
210 215 220 

Arg Thr Leu Met Lys Met Pro Asn He Lys He Ser Arg Pro Leu Lys 
225 230 235 240 

Val Leu Leu Thr Val Val He Val Phe He Val Thr Gin Leu Pro Tyr 

245 250 255 

Asn He Val Lys Phe Cys Arg Ala He Asp He He Tyr Ser Leu He 

260 265 270 

Thr Ser Cys Asn Met Ser Lys Arg Met Asp He Ala He Gin Val Thr 
275 280 285 

Glu Ser He Ala Leu Phe His Ser Cys Leu Asn Pro He Leu Tyr Val 
290 295 300 

Phe Met Gly Ala Ser Phe Lys Asn Tyr Val Met Lys Val Ala Lys Lys 
305 310 315 320 

Tyr Gly Ser Trp Arg Arg Gin Arg Gin Ser Val Glu Glu Phe Pro Phe 

325 330 335 

Asp Ser Glu Gly Pro Thr Glu Pro Thr Ser Thr Phe Ser lie 

340 345 350 

(26) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

ATG C CAGG AA ACGCCACCCC AGTGACCACC ACTGCCCCGT GGGCCTCCCT GGGCCTCTCC 60 

GCCAAGACCT GCAACAACGT GTCCTTCGAA GAGAGCAGGA TAGTCCTGGT CGTGGTGTAC 12 0 

AGCGCGGTGT GCACGCTGGG GGTGCCGGCC AACTGCCTGA CTGCGTGGCT GGCGCTGCTG 180 

CAGGTACTGC AGGGCAACGT GCTGGCCGTC TACCTGCTCT GCCTGGCACT CTGCGAACTG 24 0 

CTGTACACAG GCACGCTGCC ACTCTGGGTC ATC TATATCC GCAACCAGCA CCGCTGGACC 300 

CTAGGCCTGC TGGCCTCGAA GGTGACCGCC TACATCTTCT TCTGCAACAT CTACGTCAGC 360 

ATCCTCTTCC TGTGCTGCAT CTCCTGCGAC CGCTTCGTGG CCGTGGTGTA CGCGCTGGAG 42 0 

AGTCGGGGCC GCCGCCGCCG GAGGACCGCC ATC CTCATCT CCGCCTGCAT CTTCATCCTC 48 0 

GTCGGGATCG TTCACTACCC GGTGTTCCAG ACGGAAGACA AGGAGACCTG CTTTGACATG 54 0 

CTGCAGATGG ACAGCAGGAT TGCCGGGTAC TACTACGCCA GGTTCACCGT TGGCTTTGCC 6 00 

ATCCCTCTCT CCATCATCGC CTTCACCAAC CAC CGGATTT TC AGG AG CAT CAAGCAGAGC 66 0 

ATGGGCTTAA GCGCTGCCCA GAAGGCCAAG GTGAAGCACT CGGCCATCGC GGTGGTTGTC 72 0 

ATCTTCCTAG TCTGCTTCGC CCCGTACCAC CTGGTTCTCC TCGTCAAAGC CGCTGCCTTT 780 

TCCTACTACA GAGGAGACAG GAACGCCATG TGCGGCTTGG AGGAAAGGCT GTACACAGCC 84 0 

TCTGTGGTGT TTCTGTGCCT GTCCACGGTG AACGGCGTGG CTGACCCCAT T ATC TACGTG 900 

CTGGCCACGG ACCATTCCCG CCAAGAAGTG TCCAGAATCC ATAAGGGGTG GAAAGAGTGG 96 0 

TCCATGAAGA CAGACG TC AC CAGGCTCACC CAC AG CAGGG ACACCGAGGA GCTGCAGTCG 102 0 

CCCGTGGCCC TTGCAGACCA CTACACCTTC TCCAGGCCCG TGCACCCACC AGGGTCACCA 108 0 

TGCCCTGCAA AGAGGCTGAT TGAGGAGTCC TGCTGA 1116 
(28) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 371 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS : - 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

Met Pro Gly Asn Ala Thr Pro Val Thr Thr Thr Ala Pro Trp Ala Ser 
1 5 10 15 
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Leu Gly Leu Ser 

20 

Arg He Val Leu 
35 



Pro Ala Asn Cys 
50 

Gly Asn Val Leu 
65 

Leu Tyr Thr Gly 



His Arg Trp Thr 

100 

Phe Phe Cys Asn 
115 

Cys Asp Arg Phe 
130 

Arg Arg Arg Arg 
145 

Val Gly He Val 



Cys Phe Asp Met 

180 

Ala Arg Phe Thr 
195 

Thr Asn His Arg 
210 

Ala Ala Gin Lys 

225 

He Phe Leu Val 



Ala Ala Ala Phe 

260 

Leu Glu Glu Arg 
275 

Thr Val Asn Gly 
290 
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Ala Lys Thr Cys 



Val Val Val Tyr 

40 



Leu Thr Ala Trp 
55 

Ala Val Tyr Leu 
70 

Thr Leu Pro Leu 
85 

Leu Gly Leu Leu 



He Tyr Val Ser 

120 

Val Ala val Val 
135 

Thr Ala He Leu 
150 

His Tyr Pro Val 
165 

Leu Gin Met Asp 



Val Gly Phe Ala 

200 

He Phe Arg Ser 
215 

Ala Lys Val Lys 
230 

Cys Phe Ala Pro 
245 

Ser Tyr Tyr Arg 



Leu Tyr Thr Ala 

280 

Val Ala Asp Pro 
295 



Asn Asn Val Ser 
25 

Ser Ala Val Cys 



Leu Ala Leu Leu 

60 

Leu Cys Leu Ala 
75 

Trp Val He Tyr 
90 

Ala Ser Lys Val 
105 

He Leu Phe Leu 



Tyr Ala Leu Glu 

140 

He Ser Ala Cys 
155 

Phe Gin Thr Glu 
170 

Ser Arg lie Ala 
185 

He Pro Leu Ser 



He Lys Gin Ser 

220 

His Ser Ala He 
235 

Tyr His Leu Val 
250 

Gly Asp Arg Asn 
265 

Ser Val Val Phe 



He He Tyr Val 

300 



Phe Glu Glu Ser 
30 

Thr Leu Gly Val 
45 



Gin Val Leu Gin 



Leu Cys Glu Leu 

80 

He Arg Asn Gin 
95 

Thr Ala Tyr He 
110 

Cys Cys He Ser 
125 

Ser Arg Gly Arg 



He Phe He Leu 

160 

Asp Lys Glu Thr 
175 

Gly Tyr Tyr Tyr 
190 

He He Ala Phe 
205 

Met Gly Leu Ser 



Ala Val Val Val 

240 

Leu Leu Val Lys 
255 

Ala Met Cys Gly 
270 

Leu Cys Leu Ser 
285 

Leu Ala Thr Asp 
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His Ser 
305 



Arg Gin Glu Val Ser Arg lie His Lys Gly Trp Lys Glu Trp 

310 315 320 



Ser. Met 



Lys Thr Asp Val Thr Arg Leu Thr His Ser Arg Asp Thr Glu 
325 330 335 



Glu Leu 



Gin Ser Pro Val Ala Leu Ala Asp His Tyr Thr Phe Ser Arg 
340 345 350 



pro Val 



His Pro Pro Gly Ser Pro Cys Pro Ala Lys Arg Leu lie Glu 
355 360 365 



Glu Ser Cys 
370 

(2 8) INFORMATION FOR SEQ ID NO: 27: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1113 base pairs 

(B) . TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

ATGGCGAACT ATAGCCATGC AGCTGACAAC ATTTTGCAAA ATCTCTCGCC TCTAACAGCC 6 0 

TTTCTGAAAC TGACTTCCTT GGGTTTCATA AT AGGAG TC A GCGTGGTGGG CAACCTCCTG 120 

ATCTCCATTT TGCTAGTGAA AG AT AAGAC C TTG CAT AG AG C AC CTTAC T A CTTCCTGTTG 180 

GATCTTTGCT GTT C AG AT AT CCTCAGATCT GCAATTTGTT TCCCATTTGT GTTCAACTCT 240 

GTCAAAAATG GCTCTACCTG GACTTATGGG ACTCTGACTT GCAAAG TG AT TGCCTTTCTG 3 00 

GGGGTTTTGT CCTGTTTCCA CACTGCTTTC ATGCTCTTCT GCATCAGTGT C AC CAGATAC 360 

TTAGCTATCG CCCATCACCG CTTCTATACA AAG AG GCTG A CCTTTTGGAC GTGTCTGGCT 42 0 

GTGATCTGTA TGGTGTGGAC TCTGTCTGTG GCCATGGCAT TTCCCCCGGT TTTAGACGTG 48 0 

GGCACTTACT CATT CATT AG GGAGGAAGAT CAATGCACCT TCCAACACCG CTCCTTCAGG 54 0 

GCTAATGATT CCTTAGGATT TATGCTGCTT CTTGCTCTCA TCCTCCTAGC CACACAGCTT 60 0 

GTCTACCTCA AG CTGATATT TTTCGTCCAC GATCGAAGAA AAATGAAGCC AGTCCAGTTT 660 

GTAGCAGCAG TCAGCCAGAA CTGGACTTTT CATGGTCCTG GAGCCAGTGG CCAGGCAGCT 72 0 

GCCAATTGGC TAGCAGGATT TGGAAGGGGT CCCACACCAC CCACCTTGCT GGGCATCAGG 780 

CAAAATGCAA ACACCACAGG CAGAAGAAGG CTATTGGTCT TAG AC G AGT T CAAAATGGAG 84 0 
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AAAAGAATCA GCAGAATGTT CTATATAATG ACTTTTCTGT TTCTAACCTT GTGGGGCCCC 900 

TACCTGGTGG CCTGTTATTG GAGAGTTTTT GCAAGAGGGC CTGTAGTACC AGGGGGATTT 960 

CTAACAGCTG CTGTCTGGAT GAGTTTTGCC CAAGCAGGAA TCAATCCTTT TGTCTGCATT 102 0 

TTCTCAAACA GGGAG CTGAG GCGCTGTTTC AGCACAACCC TTCTTTACTG CAGAAAATCC 1080 

AGGTTACCAA GGGAAC CTTA CTGTGTTATA TGA 1X13 
(2 9) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 370 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Met Ala Asn Tyr Ser His Ala Ala Asp Asn lie Leu Gin Asn Leu Ser 
1 5 10 15 

Pro Leu Thr Ala Phe Leu Lys Leu Thr Ser Leu Gly Phe He He Gly 

20 25 30 

Val Ser Val Val Gly Asn Leu Leu He Ser He Leu Leu Val Lys Asp 
35 40 45 

Lys Thr Leu His Arg Ala Pro Tyr Tyr Phe Leu Leu Asp Leu Cys Cys 
50 55 60 

Ser Asp He Leu Arg Ser Ala He Cys Phe Pro Phe Val Phe Asn Ser 
65 70 75 . 80 

Val Lys Asn Gly Ser Thr Trp Thr Tyr Gly Thr Leu Thr Cys Lys Val 

85 90 95 

He Ala Phe Leu Gly Val Leu Ser Cys Phe His Thr Ala Phe Met Leu 

100 105 HO 

Phe Cys He Ser Val Thr Arg Tyr Leu Ala He Ala His His Arg Phe 
115 120 125 

Tyr Thr Lys Arg Leu Thr Phe Trp Thr Cys Leu Ala Val He Cys Met 
130 135 140 

Val Trp Thr Leu Ser Val Ala Met Ala Phe Pro Pro Val Leu Asp Val 
145 150 155 160 

Gly Thr Tyr Ser Phe He Arg Glu Glu Asp Gin Cys Thr Phe Gin His 
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Leu 

5 

Val 

Ser 
225 

10 Ala 
Leu 
Val 

15 

lie 

Cys 
305 

20 Leu 
Phe 
Thr 

25 



Ser Phe Arg 
180 

lie Leu Leu 
195 

His Asp Arg 
210 

Gin Asn Trp 



Asn Trp Leu 



Gly lie Arg 
260 

Leu Asp Glu 
275 

Met Thr . Phe 
290 

Tyr Trp Arg 



Thr Ala Ala 



Val Cys lie 
340 

Leu Leu Tyr 
355 



165 

Ala Asn Asp 

Ala Thr Gin 

Arg Lys Met 
215 

Thr Phe His 
230 

Ala Gly Phe 
245 

Gin Asn Ala 



Phe Lys Met 



Leu Phe Leu 
295 

Val Phe Ala 
310 

Val Trp Met 
325 

Phe Ser Asn 



Cys Arg Lys 



Ser Leu Gly 
185 

Leu Val Tyr 
200 

Lys Pro Val 



Gly Pro Gly 



Gly Arg Gly 

250 

Asn Thr Thr 
265 

Glu Lys Arg 
280 

Thr Leu Trp 



Arg Gly Pro 



Ser Phe Ala 
330 

Arg Glu Leu 
345 

Ser Arg Leu 
360 



Phe Met Leu 



Leu Lys Leu 
205 

Gin Phe Val 
220 

Ala Ser Gly 
235 

Pro Thr Pro 



Gly Arg Arg 



lie Ser Arg 
285 

Gly Pro Tyr 
300 

Val Val Pro 
315 

Gin Ala Gly 



Arg Arg Cys 



Pro Arg Glu 

365 



175 

Leu Leu Ala 
190 

lie Phe Phe 



Ala Ala Val 



Gin Ala Ala 
240 

Pro Thr Leu 
255 

Arg Leu Leu 
270 

Met Phe Tyr 



Leu Val Ala 



Gly Gly Phe 
320 

lie Asn Pro 
335 

Phe Ser Thr 
350 

Pro Tyr Cys 



Val lie 
370 



(30) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1080 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
ATGCAGGTCC CGAACAGCAC CGGCCCGGAC AACGCGACGC TGCAGATGCT GCGGAACCCG 



60 



WO 00/22131 



PCT/US99/24065 



-35- 

GCGATCGCGG TGGCCCTGCC CGTGGTGTAC TCGCTGGTGG CGG CGGTCAG CATCCCGGGC 120 

AACCTCTTCT CTCTGTGGGT GCTGTGCCGG CGCATGGGGC CCAGATCCCC GTCGGTCATC 180 

TTCATGATCA ACCTGAGCGT CACGGACCTG ATG CTGGCCA GCGTGTTGCC TTTCCAAATC 24 0 

TACTACCATT GCAACCGCCA CCACTGGGTA TTCGGGGTGC TGCTTTGCAA CGTGGTGACC 3 00 

GTGGCCTTTT AC G CAAACAT GTATTCCAGC ATCCTCACCA TGACCTGTAT CAGCGTGGAG 360 

CGCTTCCTGG GGGTCCTGTA CCCGCTCAGC TCCAAGCGCT GGCGCCGCCG TCGTTACGCG 420 

GTGGCCGCGT GTGCAGGGAC CTGGCTGCTG CTCCTGACCG CCCTGTGCCC GCTGGCGCGC 480 

ACCGATCTCA CCTACCCGGT GCACGCCCTG GGCATCATCA CCTGCTTCGA CGTCCTCAAG 54 0 

TGGACGATGC TCCCCAGCGT GGCCATGTGG GCCGTGTTCC TCTTCACCAT CTTCATCCTG 600 

CTGTTC CTCA TCCCGTTCGT GATCAC CGTG GCTTGTTACA CGGCCACCAT CCTCAAGCTG 660 

TTGCGCACGG AGGAGGCGCA CGGCCGGGAG CAG CGGAGGC GCGCGGTGGG CCTGGCCGCG 72 0 

GTGGTCTTGC TGGCCTTTGT CACCTGCTTC GCCCCCAACA ACTTCGTGCT CCTGGCGCAC 780 

AT CGTG AG CC GCCTGTTCTA CGGCAAGAGC TACTACCACG TGTACAAGCT CACGCTGTGT 84 0 

CTCAGCTGCC TCAACAACTG TCTGGACCCG TTTGTTTATT ACTTTGCGTC CCGGGAATTC 9 00 

CAGCTGCGCC TGCGGGAATA TTTGGGCTGC CGCCGGGTGC C CAGAGAC AC CCTGGACACG 96 0 

CGCCGCGAGA GCCTCTTCTC CGCCAGGACC ACGTCCGTGC GCTCCGAGGC CGGTGCGCAC 102 0 

CCTGAAGGGA TGGAGGGAGC CACCAGGCCC GGCCTCCAGA GGCAGGAGAG TGTGTTCTGA 10 80 

(31) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 359 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Met Gin Val Pro Asn Ser Thr Gly Pro Asp Asn Ala Thr Leu Gin Met 
15 10 15 

Leu Arg Asn Pro Ala He Ala Val Ala Leu Pro Val Val Tyr Ser Leu 

20 25 30 

Val Ala Ala Val Ser He Pro Gly Asn Leu Phe Ser Leu Trp Val Leu 
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35 40 45 

Cys Arg Arg Met Gly Pro Arg Ser Pro Ser Val lie Phe Met He Asn 
50 55 60 

L eu Ser Val Thr Asp Leu Met Leu Ala Ser Val Leu Pro Phe Gin He 
65 70 75 80 

Tyr Tyr His Cys Asn Arg His His Trp Val Phe Gly Val Leu Leu Cys 

85 90 95 

A sn Val Val Thr Val Ala Phe Tyr Ala Asn Met Tyr Ser Ser He Leu 

100 105 HO 

Thx Met Thr Cys He Ser Val Glu Arg Phe Leu Gly Val Leu Tyr Pro 
115 120 125 

Leu Ser Ser Lys Arg Trp Arg Arg Arg Arg Tyr Ala Val Ala Ala Cys 
130 135 140 

Ala Gly Thr Trp Leu Leu Leu Leu Thr Ala Leu Cys Pro Leu Ala Arg 
145 150 155 160 

Thr Asp Leu Thr Tyr Pro Val His Ala Leu Gly He He Thr Cys Phe 

165 170 175 

Asp Val Leu Lys Trp Thr Met Leu Pro Ser Val Ala Met Trp Ala Val 

180 185 190 

Phe Leu Phe Thr He Phe He Leu Leu Phe Leu He Pro Phe Val He 
195 200 205 

Thr Val Ala Cys Tyr Thr Ala Thr He Leu Lys Leu Leu Arg Thr Glu 
210 215 220 

Glu Ala His Gly Arg Glu Gin Arg Arg Arg Ala Val Gly Leu Ala Ala 
225 230 235 240 

Val Val Leu Leu Ala Phe Val Thr Cys Phe Ala Pro Asn Asn Phe Val 

245 250 255 

Leu Leu Ala His He Val Ser Arg Leu Phe Tyr Gly Lys Ser Tyr Tyr 

260 265 270 

His Val Tyr Lys Leu Thr Leu Cys Leu Ser Cys Leu Asn Asn Cys Leu 
275 280 285 

Asp Pro Phe Val Tyr Tyr Phe Ala Ser Arg Glu Phe Gin Leu Arg Leu 
290 295 300 

Arg Glu Tyr Leu Gly Cys Arg Arg Val Pro Arg Asp Thr Leu Asp Thr 
305 * 310 315 320 

Arg Arg Glu Ser Leu Phe Ser Ala Arg Thr Thr Ser Val Arg Ser Glu 

325 330 335 
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Ala Gly Ala His Pro GXu Gly Met Glu Gly Ala Thr Arg Pro Gly Leu 

340 345 350 

Gin Arg Gin Glu Ser Val Phe 
355 

(32) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1503 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

ATGGAGCGTC CCTGGGAGGA CAGCCCAGGC CCGGAGGGGG CAGCTGAGGG CTCGCCTGTG 60 

CCAGTCGCCG CCGGGGCGCG CTCCGGTGCC GCGGCGAGTG GCACAGGCTG GCAGCCATGG 120 

GCTGAGTGCC CGGGAC CCAA GGGGAGGGGG CAACTGCTGG CGACCGCCGG CCCTTTGCGT 180 

CGCTGGCCCG CCCCCTCGCC TGCCAGCTCC AGCCCCGCCC CCGGAGCGGC GTCCGCTCAC 240 

TCGGTTCAAG G C AG C G CG AC TGCGGGTGGC GCACGACCAG GGCGCAGACC TTGGGGCGCG 300 

CGGCCCATGG AGTCGGGGCT GCTGCGGCCG GCGCCGGTGA GCGAGGTCAT CGTCCTGCAT 36 0 

TACAACTACA CCGGCAAGCT CCGCGGTGCG AGCTAC C AG C CGGGTGCCGG CCTGCGCGCC 420 

GACGCCGTGG TGTGCCTGGC GGTGTGCGCC TTCATCGTGC TAG AG AAT CT AGCCGTGTTG 4 80 

TTGGTGCTCG GACGCCACCC GCGCTTCCAC GCTCCCATGT TCCTGCTCCT GGGCAGCCTC 54 0 

ACGTTGTCGG ATCTGCTGGC AGGCGCCGCC TACGCCGCCA ACATCCTACT GTCGGGGCCG 6 00 

CTCACGCTGA AACTGTCCCC CGCGCTCTGG TTCGCACGGG AGGGAGGCGT CTT CGTGGCA 66( 

CTCACTGCGT CCGTGCTGAG CCTCCTGGCC ATCGCGCTGG AGCGCAGCCT CACCATGGCG 72 ( 

CGCAGGGGGC CCGCGCCCGT CTCCAGTCGG GGGCGCACGC TGGCGATGGC AGCCGCGGCC 781 

TGGGGCGTGT CGCTGCTCCT CGGGCTCCTG CCAGCGCTGG GCTGGAATTG CCTGGGTCGC 841 

CTGGACGCTT GCTCCACTGT CTTGCCGCTC TACGCCAAGG CCTACGTGCT CTTCTGCGTG 901 

CTCGCCTTCG TGGGCATCCT GGCCGCGATC TGTGCACTCT ACGCGCGCAT CTACTGCCAG 96 

GTACGCGCCA ACGCGCGGCG CCTGCCGGCA CGGCCCGGGA CTGCGGGGAC CACCTCGACC 102 

CGGGCGCGTC GCAAGCCGCG CTCTCTGGCC TTGCTGCGCA CGCTCAGCGT GGTGCTCCTG 10 8 
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GCCTTTGTGG CATGTTGGGG CCCCCTCTTC CTGCTGCTGT TGCTCGACGT GGCGTGCCCG 114 0 

GCGCGCACCT GTCCTGTACT CCTGCAGGCC GATCCCTTCC TGGGACTGGC CATGGCCAAC 1200 

TCACTTCTGA ACCCCATCAT CTACACGCTC ACCAACCGCG ACCTGCGCCA CGCGCTCCTG 126 0 

CGCCTGGTCT GCTGCGGACG CCACTCCTGC GGCAGAGACC CGAGTGGCTC CCAGCAGTCG 132 0 

5 GCGAGCGCGG CTGAGGCTTC CGGGGGCCTG CGCCGCTGCC TGCCCCCGGG CCTTGATGGG 1380 

AGCTTCAGCG GCTCGGAGCG CTCATCGCCC CAGCGCGACG GGCTGGACAC CAGCGGCTCC 144 0 

ACAGGCAGCC CCGGTGCACC CACAGCCGCC CGGACTCTGG TATCAGAACC GGCTGCAGAC 1500 

TGA 1503 
(33) INFORMATION FOR SEQ ID NO: 32: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 500 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

15 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Met . Glu Arg Pro Trp Glu Asp Ser Pro Gly Pro Glu Gly Ala Ala Glu 
1 5 10 15 

Gly Ser Pro Val Pro Val Ala Ala Gly Ala Arg Ser Gly Ala Ala Ala 
20 20 25 30 

Ser Gly Thr Gly Trp Gin Pro Trp Ala Glu Cys Pro Gly Pro Lys Gly 
35 40 45 

Arg Gly Gin Leu Leu Ala Thr Ala Gly Pro Leu Arg Arg Trp Pro Ala 
50 55 60 

25 Pro Ser Pro Ala Ser Ser Ser Pro Ala Pro Gly Ala Ala Ser Ala His 

65 70 75 80 

Ser Val Gin Gly Ser Ala Thr Ala Gly Gly Ala Arg Pro Gly Arg Arg 

85 90 95 

Pro Trp Gly Ala Arg Pro Met Glu Ser Gly Leu Leu Arg Pro Ala Pro 
30 100 105 110 

Val Ser Glu Val lie Val Leu His Tyr Asn Tyr Thr Gly Lys Leu Arg 
115 120 125 

Gly Ala Ser Tyr Gin Pro Gly Ala Gly Leu Arg Ala Asp Ala Val Val 
130 135 140 
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Cys Leu Ala Val 
145 

Leu Val Leu Gly 



Leu Gly Ser Leu 

180 

Ala Asn lie Leu 
195 

Leu Trp Phe Ala 
210 

Val Leu Ser Leu 
225 

Arg Arg Gly Pro 



Ala Ala Ala Ala 

260 

Leu Gly Trp Asn 
275 

pro Leu Tyr Ala 
290 

Gly He Leu Ala 
305 

Val Arg Ala Asn 



Thr Thr Ser Thr 

340 

Arg Thr Leu Ser 
355 

Leu Phe Leu Leu 
370 

pro Val Leu Leu 
385 

Ser Leu Leu Asn 



His Ala Leu Leu 

420 

Asp Pro Ser Gly 
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Cys Ala Phe He 
150 

Arg His Pro Arg 

165 

Thr Leu Ser Asp 



Leu Ser Gly Pro 

200 

Arg Glu Gly Gly 
215 

Leu Ala lie Ala 
230 

Ala Pro Val Ser 
245 

Trp Gly Val Ser 



Cys Leu Gly Arg 

280 

Lys Ala Tyr Val 
295 

Ala He Cys Ala 
310 

Ala Arg Arg Leu 
325 

Arg Ala Arg Arg 



Val Val Leu Leu 

360 

Leu Leu Leu Asp 
375 

Gin Ala. Asp Pro 
390 

Pro He He Tyr 
405 

Arg Leu Val Cys 



Ser Gin Gin Ser 



Val Leu Glu Asn 
155 

Phe His Ala Pro 
170 

Leu Leu Ala Gly 
185 

Leu Thr Leu Lys 



Val Phe Val Ala 

220 

Leu Glu Arg Ser 
235 

Ser Arg Gly Arg 
250 

Leu Leu Leu Gly 
265 

Leu Asp Ala Cys 



Leu Phe Cys Val 

300 

Leu Tyr Ala Arg 
315 

Pro Ala Arg Pro 
330 

Lys Pro Arg Ser 
345 

Ala Phe Val Ala 



Val Ala Cys Pro 

380 

Phe Leu Gly Leu 
395 

Thr Leu Thr Asn 
410 

Cys Gly Arg His 
425 

Ala Ser Ala Ala 



Leu Ala Val Leu 

160 

Met Phe Leu Leu 
175 

Ala Ala Tyr Ala 
190 

Leu Ser Pro Ala 
205 

Leu Thr Ala Ser 



Leu Thr Met Ala 

240 

Thr Leu Ala Met 
255 

Leu Leu Pro Ala 
270 

Ser Thr Val Leu 
285 

Leu Ala Phe Val 



He Tyr Cys Gin 

320 

Gly Thr Ala Gly 
335 

Leu Ala Leu Leu 
350 

Cys Trp Gly Pro 
365 

Ala Arg Thr Cys 



Ala Met Ala Asn 

400 

Arg Asp Leu Arg 
415 

Ser Cys Gly Arg 
430 

Glu Ala Ser Gly 
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435 440 445 

Gly Leu Arg Arg Cys Leu Pro Pro Gly Leu Asp Gly Ser Phe Ser Gly 
450 455 460 

Ser Glu Arg Ser Ser' Pro Gin Arg Asp Gly Leu Asp Thr Ser Gly Ser 
465 470 475 480 

Thr Gly Ser Pro Gly Ala Pro Thr Ala Ala Arg Thr Leu Val Ser Glu 

485 490 495 

Pro Ala Ala Asp 

500 

(34) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 102 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
ATGCAAGCCG TCGACAATCT CACCTCTGCG CCTGGGAACA CCAGTCTGTG CACCAGAGAC 60 

TACAAAATCA CCCAGGTCCT CTTCCCACTG CTCTACACTG TCCTGTTTTT TGTTGGACTT 12 0 

ATCACAAATG GCCTGGCGAT GAGGATTTTC TTTCAAATCC GGAGTAAATC AAACTTTATT 180 

ATTTTTCTTA AGAACACAGT CATTTCTGAT CTTCT CATG A TTCTGACTTT TCCATTCAAA 240 

ATTCTTAGTG ATGCCAAACT GGGAACAGGA CCACTGAGAA CTTTTGTGTG TCAAGTTACC 3 00 

TCCGTCATAT TTTATTTCAC AATGTATATC AGTATTTCAT TCCTGGGACT GATAACTATC 3 60 

GATCGCTACC AGAAGACCAC CAGGCCATTT AAAACATCCA ACCCCAAAAA TCTCTTGGGG 420 

GCTAAGATTC TCTCTGTTGT CATCTGGGCA TTCATGTTCT TACTCTCTTT GCCTAACATG 4 80 

ATTCTGACCA ACAGGCAGCC GAGAGACAAG AATGTGAAGA AATGCTCTTT CCTTAAATCA 540 

GAGTTCGGTC TAGTCTGGC A TGAAATAGTA AATTACATCT GTCAAGTCAT TTTCTGGATT 6 00 

AATTTCTTAA TTGTTATTGT ATGTTATACA C T CATT ACAA AAGAACTGTA CCGGTCATAC 6 60 

GTAAGAACGA GGGGTGTAGG TAAAGTCCCC AGGAAAAAGG TGAACGTCAA AGTTTTCATT 720 

AT CATTG CTG TATTCTTTAT TTGTTTTGTT CCTTTCCATT TTGCCCGAAT TCCTTACACC 780 

CTGAGCCAAA CCCGGGATGT CTTTGACTGC ACTG C TG AAA ATACTCTGTT CTATGTGAAA 840 
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GAGAGCACTC TGTGGTTAAC TTCCTTAAAT GCATGCCTGG ATCCGTTCAT CT ATTTTTT C 900 

CTTTG CAAGT CCTTCAGAAA TTCCTTGATA AGTATGCTGA AGTGCCCCAA TTCTGCAACA 96 0 

TCTCTGTCCC AGGACAATAG GAAAAAAGAA CAGGATGGTG GTG ACC C AAA TGAAGAGACT 102 0 

CCAATGTAA 102 9 
5 (35) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 342 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

]0 (D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Met Gin Ala Val Asp Asn Leu Thr Ser Ala Pro Gly Asn Thr Ser Leu 
1 5 10 15 

15 Cys Thr Arg Asp Tyr Lys lie Thr Gin Val Leu Phe Pro Leu Leu Tyr 

20 25 30 

Thr Val Leu Phe Phe Val Gly Leu lie Thr Asn Gly Leu Ala Met Arg 
35 40 45 

He Phe Phe Gin He Arg Ser Lys Ser Asn Phe He lie Phe Leu Lys 
20 50 55 60 

Asn Thr Val He Ser Asp Leu Leu Met He Leu Thr Phe Pro Phe Lys 
65 70 75 80 

He Leu Ser Asp Ala Lys Leu Gly Thr Gly Pro Leu Arg Thr Phe Val 

85 90 95 

25 Cys Gin Val Thr Ser Val He Phe Tyr Phe Thr Met Tyr He Ser He 

100 105 HO 

Ser Phe Leu Gly Leu He Thr He Asp Arg Tyr Gin Lys Thr Thr Arg 
115 120 125 

Pro Phe Lys Thr Ser Asn Pro Lys Asn Leu Leu Gly Ala Lys He Leu 
30 130 135 140 

Ser Val Val He Trp Ala Phe Met Phe Leu Leu Ser Leu Pro Asn Met 
145 150 155 160 

He Leu Thr Asn Arg Gin Pro Arg Asp Lys Asn Val Lys Lys Cys Ser 

165 170 175 

35 Phe Leu Lys Ser Glu Phe Gly Leu Val Trp His Glu He Val Asn Tyr 
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180 185 190 

lie Cys Gin Val lie Phe Trp He Asn Phe Leu He Val He Val Cys 
!95 200 205 

Tyr Thr Leu He Thr Lys Glu Leu Tyr Arg Ser Tyr Val Arg Thr Arg 
5 210 215 220 

Gly Val Gly Lys Val Pro Arg Lys Lys Val Asn Val Lys Val Phe He 
225 230 235 240 

lie He Ala Val Phe Phe He Cys Phe Val Pro Phe His Phe Ala Arg 

245 250 255 

10 He Pro Tyr Thr Leu Ser Gin Thr Arg Asp Val Phe Asp Cys Thr Ala 

260 265 270 

Glu Asn Thr Leu Phe Tyr Val Lys Glu Ser Thr Leu Trp Leu Thr Ser 
275 280 285 

Leu Asn Ala Cys Leu Asp Pro Phe He Tyr Phe Phe Leu Cys Lys Ser 
15 290 295 300 

Phe Arg Asn Ser Leu He Ser Met Leu Lys Cys Pro Asn Ser Ala Thr 
305 310 315 320 

Ser Leu Ser Gin Asp Asn Arg Lys Lys Glu Gin Asp Gly Gly Asp Pro 

325 330 335 

20 Asn Glu Glu Thr Pro Met 

340 

(3 6) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1077 base pairs 
25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

30 ATGTCGGTCT GCTACCGTCC CCCAGGGAAC GAGACACTGC TGAGCTGGAA GACTTCGCGG 60 

GCCACAGGCA CAGCCTTCCT GCTGCTGGCG GCGCTGCTGG GGCTGCCTGG CAACGGCTTC 120 

GTGGTGTGGA GCTTGGCGGG CTGGCGGCCT GCACGGGGGC GACCGCTGGC GGCCACGCTT 180 

GTGCTGCACC TGGCGCTGGC CGACGGCGCG GTG CTGCTGC TCACGCCGCT CTTTGTGGCC 24 0 

TTCCTGACCC GGCAGGCCTG GCCGCTGGGC CAGGCGGGCT GCAAGGCGGT GTACTACGTG 3 00 



> 
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TGCGCGCTCA 


GCATGTACGC 


CAGCGTGCTG 


CTCACCGGCC 


TGCTCAGCCT 


GCAGCGCTGC 


360 


CTCGCAGTCA 


CCCGCCCCTT 


CCTGGCGCCT 


CGGCTGCGCA 


GCCCGGCCCT 


GGCCCGCCGC 


420 


CTGCTGCTGG 


CGGTCTGGCT 


GGCCGCCCTG 


TTGCTCGCCG 


TCCCGGCCGC 


CGTCTACCGC 


4B0 


CACCTGTGGA 


GGGACCGCGT 


ATGC CAGCTG 


TGCCACCCGT 


CGCCGGTCCA 


CGCCGCCGCC 


540 


CACCTGAGCC 


TGGAGACTCT 


GACCGCTTTC 


GTGCTTCCTT 


TCGGGCTGAT 


GCTCGGCTGC 


€00 


TACAGCGTGA 


CGCTGGCACG 


GCTGCGGGGC 


GCCCGCTGGG 


GCTCCGGGCG 


GCACGGGGCG 


660 


CGGGTGGGCC 


GGCTGGTGAG 


CGCCATCGTG 


CTTGCCTTCG 


GCTTGCTCTG 


GGCCCCCTAC 


720 


CACGCAGTCA 


ACCTTCTGCA 


GGCGGTCGCA 


GCGCTGGCTC 


CAC CGGAAGG 


GGCCTTGGCG 


780 




GAGCCGGCCA 


GGCGGCGCGA 


GCGGGAACTA 


CGGCCTTGGC 


CTTCTTCAGT 


840 


TCTAGCGTCA 


ACCCGGTGCT 


CTACGTCTTC 


ACCGCTGGAG 


ATCTGCTGCC 


CCGGGCAGGT 


900 


CCCCGTTTCC 


TCACGCGGCT 


CTTCGAAGGC 


TCTGGGGAGG 


CCCGAGGGGG 


CGGCCGCTCT 


960 


AGGGAAGGGA 


CCATGGAGCT 


CCGAACTACC 


CCTCAGCTGA 


AAGTGGTGGG 


GCAGGGCCGC 


1020 


GGCAATGGAG 


ACCCGGGGGG 


TGGGATGGAG 


AAGGACGGTC 


CGGAATGGGA 


CCTTTGA 


1077 



(37) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 3 58 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Met Ser Val Cys Tyr Arg Pro Pro Gly Asn Glu Thr Leu Leu Ser Trp 
1 5 io 15 

Lys Thr Ser Arg Ala Thr Gly Thr Ala Phe Leu Leu Leu Ala Ala Leu 

20 25 30 

Leu Gly Leu Pro Gly Asn Gly Phe Val Val Trp Ser Leu Ala Gly Trp 
35 40 45 

Arg Pro Ala Arg Gly Arg Pro Leu Ala Ala Thr Leu Val Leu His Leu 
50 55 60 

Ala Leu Ala Asp Gly Ala Val Leu Leu Leu Thr Pro Leu Phe Val Ala 
65 70 75 80 

Phe Leu Thr Arg Gin Ala Trp Pro Leu Gly Gin Ala Gly Cys Lys Ala 
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85 90 95 



Val Tyr Tyr Val Cys Ala Leu Ser Met Tyr Ala Ser Val Leu Leu Thr 

100 105 HO 

Gly Leu Leu Ser Leu Gin Arg Cys Leu Ala Val Thr Arg Pro Phe Leu 
5 115 120 125 

Ala Pro Arg Leu Arg Ser Pro Ala Leu Ala Arg Arg Leu Leu Leu Ala 
130 135 140 

Val Trp Leu Ala Ala Leu Leu Leu Ala Val Pro Ala Ala Val Tyr Arg 
145 150 155 160 

10 His Leu Trp Arg Asp Arg Val Cys Gin Leu Cys His Pro Ser Pro Val 

165 170 175 

His Ala Ala Ala His Leu Ser Leu Glu Thr Leu Thr Ala Phe Val Leu 

180 185 190 

Pro Phe Gly Leu Met Leu Gly Cys Tyr Ser Val Thr Leu Ala Arg Leu 
15 195 200 205 

Arg Gly Ala Arg Trp Gly Ser Gly Arg His Gly Ala Arg Val Gly Arg 
210 215 220 

Leu Val Ser Ala lie Val Leu Ala Phe Gly Leu Leu Trp Ala Pro Tyx 
225 230 235 240 

20 His Ala Val Asn Leu Leu Gin Ala Val Ala Ala Leu Ala Pro Pro Glu 

245 250 255 

Gly Ala Leu Ala Lys Leu Gly Gly Ala Gly Gin Ala Ala Arg Ala Gly 

260 265 270 

Thr Thr Ala Leu Ala Phe Phe Ser Ser Ser Val Asn Pro Val Leu Tyr 
25 275 280 285 

Val Phe Thr Ala Gly Asp Leu Leu Pro Arg Ala Gly Pro Arg Phe Leu 
290 295 300 

Thr Arg Leu Phe Glu Gly Ser Gly Glu Ala Arg Gly Gly Gly Arg Ser 
305 310 315 320 

30 Arg Glu Gly Thr Met Glu Leu Arg Thr Thr Pro Gin Leu Lys Val Val 

325 330 335 

Gly Gin Gly Arg Gly Asn Gly Asp Pro Gly Gly Gly Met Glu Lys Asp 

340 345 350 

Gly Pro Glu Trp Asp Leu 
35 355 

(38) INFORMATION FOR SEQ ID NO: 37: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1005 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

ATGCTGGGGA TCATGGCATG GAATGCAACT TGCAAAAACT GGCTGGCAGC AGAGGCTGCC 60 

CTGGAAAAGT ACT AC CTTTC CATTTTTTAT GGGATTGAGT TCGTTGTGGG AGTCCTTGGA 120 

10 AATACCATTG TTGTTTACGG CTACATCTTC TCTCTGAAGA ACTGGAACAG CAGTAATATT 180 

TATCTCTTTA ACCTCTCTGT CTCTGACTTA GCTTTTCTGT GCACCCTCCC CATGCTGATA 240 

AGGAGTTATG CCAATGGAAA CTGGATATAT GGAGACGTGC TCTGCATAAG CAACCGATAT 300 

GTGCTTCATG CCAACCTCTA TACCAGCATT CTCTTTCTCA CTTTTAT CAG CATAGATCGA 360 

TACTTGATAA TTAAGTATCC TTTCCGAGAA CACCTTCTGC AAAAGAAAGA GTTTGCTATT 42 0 

15 TTAATCTCCT TGGCCATTTG GGTTTTAGTA AC C TTAGAGT TACTACCCAT ACTTCCCCTT 4 80 

ATAAATCCTG TTATAACTGA CAATGGCACC ACCTGTAATG ATTTTGCAAG TTCTGGAGAC 540 

CCCAACTACA ACCTCATTTA CAGCATGTGT CTAACACTGT TGGGGTTCCT TATTC CTCTT 6 00 

TTTGTGATGT GTTTCTTTTA TTACAAGATT GCTCTCTTCC TAAAGCAGAG GAATAGGCAG 66 0 

GTTGCTACTG CTCTGCCCCT TGAAAAG CCT CTCAACTTGG TCATCATGGC AGTGGTAATC 72 0 

20 TTCTCTGTGC TTTTTACACC CTATCACGTC ATGCGGAATG TGAGGATCGC TTCACGCCTG 7 80 

GGGAGTTGGA AGCAGTATCA GTGCACTCAG GTCGTCATCA ACTCCTTTTA CATTG TGACA 84 0 

CGGCCTTTGG CCTTTCTGAA CAG TGTCATC AAC CCTGTCT TCTATTTTCT TTTGGGAGAT 900 

CACTTCAGGG ACATGCTGAT GAATCAACTG AGACACAACT TCAAATCCCT TACATCCTTT 960 

AGCAGATGGG CTCATGAACT CCTACTTTCA TTCAGAGAAA AGTGA 1005 

25 (39). INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 334 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 

Met Leu Gly lie Met. Ala Trp Asn Ala Thr Cys Lys Asn Trp Leu Ala 
1 5 10 . 15 

Ala Glu Ala Ala Leu Glu Lys Tyr Tyr Leu Ser He Phe Tyr Gly He 

20 25 30 

Glu Phe Val Val Gly Val Leu Gly Asn Thr lie Val Val Tyr Gly Tyr 
35 40 45 

He Phe Ser Leu Lys Asn Trp Asn Ser Ser Asn lie Tyr Leu Phe Asn 
50 55 60 



10 Leu Ser Val Ser Asp Leu Ala Phe Leu Cys Thr Leu Pro Met Leu He 

65 70 75 80 

Arg Ser Tyr Ala Asn Gly Asn Trp He Tyr Gly Asp Val Leu Cys He 

85 90 95 

Ser Asn Arg Tyr Val Leu His Ala Asn Leu Tyr Thr Ser He Leu Phe 
15 100 105 HO 

Leu Thr Phe He Ser He Asp Arg Tyr Leu He He Lys Tyr Pro Phe 
115 120 125 

Arg Glu His Leu Leu Gin Lys Lys Glu Phe Ala He Leu He Ser Leu 
130 135 140 

20 Ala He Trp Val Leu Val Thr Leu Glu Leu Leu Pro He Leu Pro Leu 

145 150 155 160 

He Asn Pro Val He Thr Asp Asn Gly Thr Thr Cys Asn Asp Phe Ala 

165 170 175 

Ser Ser Gly Asp Pro Asn Tyr Asn Leu He Tyr Ser Met Cys Leu Thr 
25 180 185 190 

Leu Leu Gly Phe Leu He Pro Leu Phe Val Met Cys Phe Phe Tyr Tyr 
195 200 205 

Lys He Ala Leu Phe Leu Lys Gin Arg Asn Arg Gin Val Ala Thr Ala 
210 215 220 

30 Leu Pro Leu Glu Lys Pro Leu Asn Leu Val He Met Ala Val Val lie 

225 230 235 240 

Phe Ser Val Leu Phe Thr Pro Tyr His Val Met Arg Asn Val Arg He 

245 250 255 

Ala Ser Arg Leu Gly Ser Trp Lys Gin Tyr Gin Cys Thr Gin Val Val 
35 260 265 270 



He Asn Ser Phe Tyr He Val Thr Arg Pro Leu Ala Phe Leu Asn Ser 
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275 280 285 

Val He Asn Pro Val Phe Tyr Phe Leu Leu Gly Asp His Phe Arg Asp 
290 295 300 

Met Leu Met Asn Gin Leu Arg His Asn Phe Lys Ser Leu Thr Ser Phe 
5 305 310 315 320 

Ser Arg Trp Ala His Glu Leu Leu Leu Ser Phe Arg Glu Lys 

325 330 

(40) INFORMATION FOR SEQ ID NO: 39: 

{ i ) SEQUENCE CHARACTERISTICS : 
10 (A) LENGTH: 1296 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

]5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

ATGCAGGCGC TTAACATTAC CCCGGAGCAG TTCTCTCGGC TGCTGCGGGA CCACAACCTG 60 

ACGCGGGAGC AGTTCATCGC TCTGTACCGG CTGCGACCGC TCGTCTACAC CCCAGAGCTG 12 0 

CCGGGACGCG CCAAGCTGGC CCTCGTGCTC ACCGGCGTGC TCATCTTCGC CCTGGCGCTC 180 

TTTGGCAATG CTCTGGTGTT CTACGTGGTG ACCCGCAGCA AGGCCATGCG CACCGTCACC 240 

20 AACATCTTTA TCTGCTCCTT GGCGCTCAGT GACCTGCTCA TCACCTTCTT CTGCATTCCC 3 00 

GTCACCATGC TC CAGAAC AT TTCCGACAAC TGGCTGGGGG GTGCTTTCAT TTGCAAGATG 3 60 

GTGCCATTTG TC C AGTCTAC CGCTGTTGTG ACAGAAATGC T CACTATG AC CTGCATTGCT 420 

GTGGAAAGGC ACCAGGGACT TGTGCATCCT TTTAAAATGA AGTGGCAATA CACCAACCGA 480 

AGGGCTTTCA CAATG CTAGG TGTGGTCTGG CTGGTGGCAG TCATCGTAGG ATCACCCATG 540 

25 TGGCACGTGC AACAACTTGA GATCAAATAT GACTTCCTAT ATGAAAAGGA ACACATCTGC 600 

TGCTTAGAAG AGTGGACCAG CCCTGTGCAC C AG AAGATC T ACACCACCTT CATCCTTGTC 66 0 

ATCCTCTTCC TCCTGCCTCT TATGGTGATG CTTATTCTGT ACAGTAAAAT TGGTTATGAA 72 0 

CTTTGGATAA AGAAAAGAGT TGGGGATGGT TCAGTGCTTC GAACTATTCA TGGAAAAGAA 7 80 

ATGTCCAAAA TAG C C AGGAA GAAGAAACGA GCTGTCATTA TGATGGTGAC AGTGGTGGCT 840 

30 CTCTTTGCTG TGTGCTGGGC ACCATTCCAT GTTGTC CATA TGATGATTGA ATACAGTAAT 900 

TTTGAAAAGG AATATGATGA TGTCACAATC AAGATGATTT TTGCTATCGT G C AAATTATT 960 
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GGATTTTCCA ACTCCATCTG TAATCCCATT GTCTATGCAT TTATGAATGA AAACTTCAAA 1020 

AAAAATGTTT TGTCTGCAGT TTGTTATTGC ATAGTAAATA AAACCTTCTC TCCAGCACAA 1O80 

AGGCATGGAA ATTCAGGAAT TACAATGATG CGGAAGAAAG CAAAGTTTTC CCTCAGAGAG 114 0 

AATCCAGTGG AGGAAACCAA AGGAGAAGCA TTCAGTGATG GCAACATTGA AG TC AAATTG 12 00 

TGTGAACAGA C AG AG G AG AA GAAAAAGCTC AAACGACATC TTGCTCTCTT TAGGTCTGAA 1260 

CTGGCTGAGA ATTCTCCTTT AGACAGTGGG CATTAA 1296 
(41) INFORMATION FOR SEQ ID NO:40: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 431 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOIjOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 

Met Gin Ala Leu Asn lie Thr Pro Glu Gin Phe Ser Arg Leu Leu Arg 
1 5 10 15 

Asp His Asn Leu Thr Arg Glu Gin Phe He Ala Leu Tyr Arg Leu Arg 

20 25 30 

pro Leu Val Tyr Thr Pro Glu Leu Pro Gly Arg Ala Lys Leu Ala Leu 
35 40 45 

Val Leu Thr Gly Val Leu He Phe Ala Leu Ala Leu Phe Gly Asn Ala 
50 55 60 

Leu Val Phe Tyr Val Val Thr Arg Ser Lys Ala Met Arg Thr Val Thr 
65 70 75 80 

Asn He Phe He Cys Ser Leu Ala Leu Ser Asp Leu Leu lie Thr Phe 

85 90 95 

Phe Cys He Pro Val Thr Met Leu Gin Asn He Ser Asp Asn Trp Leu 

100 105 HO 

Gly Gly Ala Phe He Cys Lys Met Val Pro Phe Val Gin Ser Thr Ala 
115 120 125 

Val Val Thr Glu Met Leu Thr Met Thr Cys He Ala Val Glu Arg His 
130 135 140 

Gin Gly Leu Val His Pro Phe Lys Met Lys Trp Gin Tyr Thr Asn Arg 
145 150 155 160 



WO 00/22131 



PCT/US99/24065 



-49- 

Ar g Ala Phe Thr Met Leu Gly Val Val Trp Leu Val Ala Val He Val 

165 170 175 

Gly Ser Pro Met Trp His Val Gin Gin Leu Glu He Lys Tyr Asp Phe 

180 185 190 

^eu Tyr Glu Lys Glu His He Cys Cys Leu Glu Glu Trp Thr Ser Pro 
19 5 200 205 

Val His Gin Lys He Tyr Thr Thr Phe He Leu Val He Leu Phe Leu 
210 215 220 

Leu Pro Leu Met Val Met Leu He Leu Tyr Ser Lys He Gly Tyr Glu 
> 225 230 235 240 

Leu Trp He Lys Lys Arg Val Gly Asp Gly Ser Val Leu Arg Thr He 

245 250 255 

His Gly Lys Glu Met Ser Lys He Ala Arg Lys Lys Lys Arg Ala Val 

260 265 270 

5 He Met Met Val Thr Val Val Ala Leu Phe Ala Val Cys Trp Ala Pro 

275 280 285 

Phe His Val Val His Met Met He Glu Tyr Ser Asn Phe Glu Lys Glu 
290 295 300 

Tyr Asp Asp Val Thr He Lys Met He Phe Ala He Val Gin He He 
Z0 305 310 315 320 

Gly Phe Ser Asn Ser He Cys Asn Pro He Val Tyr Ala Phe Met Asn 

325 330 335 

Glu Asn Phe Lys Lys Asn Val Leu Ser Ala Val Cys Tyr Cys He Val 

340 345 350 



25 



Asn Lys Thr Phe Ser Pro Ala Gin Arg His Gly Asn Ser Gly He Thr 
355 360 365 

Met Met Arg Lys Lys Ala Lys Phe Ser Leu Arg Glu Asn Pro Val Glu 
370 375 380 

Glu Thr Lys Gly Glu Ala Phe Ser Asp Gly Asn He Glu Val Lys Leu 
30 3B5 390 395 400 

Cys Glu Gin Thr Glu Glu Lys Lys Lys Leu Lys Arg His Leu Ala Leu 

405 410 415 

Phe Arg Ser Glu Leu Ala Glu Asn Ser Pro Leu Asp Ser Gly His 

420 425 430 

35 (42) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 



